Scenario Engine
Convex tracks macro scenarios with Bayesian probability estimates that update as new evidence arrives. Each scenario starts with a calibrated historical base rate, then evolves through a rigorous evidence model. A Scenario Radar discovers emerging risks autonomously across 6 data sources, while automated lifecycle management handles tiered tracking, demotion, resolution, and retirement. A 22-gate pipeline ensures published analysis meets institutional quality standards.
What Makes This Different
Most scenario analysis you’ll find — from banks, hedge funds, media — works like this: an analyst writes 3–4 scenarios in a PDF, assigns gut-feel probabilities, and publishes it. Six weeks later they might update it. Between updates, it’s static. There’s no feedback loop telling the analyst whether their last set of scenarios was any good.
Convex is building something fundamentally different: a unified, continuously updating, self-correcting scenario intelligence system that discovers emerging risks autonomously, validates them through multi-source evidence gating and causal chain tracing, maintains a coherent probability space across all tracked scenarios, and improves its own accuracy over time.
01The Scenarios Are Alive
Most scenario analysis works like this: an analyst writes 3–4 scenarios in a PDF, assigns gut-feel probabilities, and publishes it. Six weeks later they might update it. Between updates, it’s static. The probabilities don’t move. New scenarios don’t appear. Old ones don’t retire.
The Scenarios Are Alive
Most scenario analysis works like this: an analyst writes 3–4 scenarios in a PDF, assigns gut-feel probabilities, and publishes it. Six weeks later they might update it. Between updates, it’s static. The probabilities don’t move. New scenarios don’t appear. Old ones don’t retire.
Convex scenarios don’t sit in a PDF. Every 6 hours, the system re-evaluates every active scenario against fresh data — new FRED releases, price moves, news clusters, positioning data. Probabilities update continuously using Bayesian math (the same framework used in quantitative finance and intelligence analysis), not gut feel. If oil spikes 10% overnight, the Energy Supply Shock scenario’s probability adjusts before any human touches it.
02Autonomous Scenario Discovery
Most platforms track a fixed list of scenarios someone wrote down. Convex runs a daily “radar” that scans across 6 independent data sources — economic data, market prices, news clustering, futures positioning, proprietary composite indices, and cross-source divergences — looking for emerging macro configurations that don’t match anything currently being tracked.
Autonomous Scenario Discovery
Most platforms track a fixed list of scenarios someone wrote down. Convex runs a daily “radar” that scans across 6 independent data sources — economic data, market prices, news clustering, futures positioning, proprietary composite indices, and cross-source divergences — looking for emerging macro configurations that don’t match anything currently being tracked.
If something new is forming, the system spots it, accumulates evidence over multiple days to filter out noise, and promotes it to active tracking when the evidence is strong enough. No human has to notice it first.
03Multi-Source Evidence Gating
When a new potential scenario emerges, it has to pass three gates before it gets tracked: confirmed by at least 2 independent data source types (not just one news story), assessed as genuinely novel rather than a variant of something already tracked, and verified to produce meaningfully different investment implications.
Multi-Source Evidence Gating
When a new potential scenario emerges, it has to pass three gates before it gets tracked: confirmed by at least 2 independent data source types (not just one news story), assessed as genuinely novel rather than a variant of something already tracked, and verified to produce meaningfully different investment implications.
This is how intelligence agencies validate threats — multiple independent sources, not one loud signal. It separates signal from noise institutionally, not intuitively.
04Causal Chain Tracing
The system has cascade templates — if oil spikes, it actively watches for the expected downstream sequence: energy news clustering within 24 hours, futures positioning shifts within 1–5 days, inflation expectation changes within 5–15 days.
Causal Chain Tracing
The system has cascade templates — if oil spikes, it actively watches for the expected downstream sequence: energy news clustering within 24 hours, futures positioning shifts within 1–5 days, inflation expectation changes within 5–15 days.
Finding the full cascade confirms the signal is real. Finding no downstream confirmation after the expected window suggests it was noise. This is how experienced macro traders actually think — they trace transmission mechanisms — but nobody has automated it.
05Automated Scenario Retirement
Each scenario has structured resolution conditions — specific, measurable thresholds (like “high-yield spreads below 400 basis points for 30 consecutive days”) that are checked automatically against live data.
Automated Scenario Retirement
Each scenario has structured resolution conditions — specific, measurable thresholds (like “high-yield spreads below 400 basis points for 30 consecutive days”) that are checked automatically against live data.
When the conditions that made a scenario relevant are definitively gone, the system flags it for retirement. It also automatically demotes scenarios whose probability has been negligible for multiple cycles. This prevents zombie scenarios cluttering analysis long after they’ve stopped being relevant.
06Dual-Brain Architecture
There’s an LLM (Claude) that’s creative and good at spotting novel patterns, and a Bayesian engine that’s disciplined and good at calibrated probability. They run in parallel.
Dual-Brain Architecture
There’s an LLM (Claude) that’s creative and good at spotting novel patterns, and a Bayesian engine that’s disciplined and good at calibrated probability. They run in parallel.
The LLM’s scenario analysis is anchored to the Bayesian probabilities so it doesn’t invent wild numbers. The Bayesian engine ingests the LLM’s novel pattern detection so it doesn’t miss emerging configurations. Where the two disagree by more than 10 percentage points, that divergence is surfaced as an analytical signal — it means either the data hasn’t caught up to the narrative or the narrative is wrong.
07Coherence Auditing
After every update, a coherence audit checks that the scenario set makes sense as a whole — mutually exclusive scenarios don’t sum to more than their ceiling, probability-weighted asset implications don’t produce logical contradictions, and conditional relationships are respected.
Coherence Auditing
After every update, a coherence audit checks that the scenario set makes sense as a whole — mutually exclusive scenarios don’t sum to more than their ceiling, probability-weighted asset implications don’t produce logical contradictions, and conditional relationships are respected.
If the system says Credit Crisis probability rose but Fed Pivot probability didn’t shift accordingly, that gets flagged. No bank’s research desk does this automatically.
08Self-Calibration
The calibration engine tracks whether probability assignments actually matched reality, whether asset price predictions were biased, which data sources were actually predictive for which types of scenarios, and whether the system was consistently early or late.
Self-Calibration
The calibration engine tracks whether probability assignments actually matched reality, whether asset price predictions were biased, which data sources were actually predictive for which types of scenarios, and whether the system was consistently early or late.
Those findings feed back into the system’s parameters. It literally gets smarter over time. Individual pieces of this exist in quantitative hedge funds, but they’re siloed. Nobody has it as one integrated system, and certainly nobody publishes it.
Individual pieces of this exist in quantitative hedge funds, but they’re siloed — the scenario team doesn’t talk to the data pipeline team doesn’t talk to the calibration team. Nobody has it as one integrated system, and certainly nobody publishes it.
Calibrated Base Rates
Every scenario starts with a calibrated base rate probabilityderived from a specific historical reference class. This is not a guess or a round number — it is the empirical frequency of similar conditions in the historical record, adjusted for current structural differences.
For example, a stagflation scenario might use the reference class “quarters with CPI >4% AND unemployment >5% since 1970”. The base rate is the fraction of such quarters that preceded a sustained stagflation episode.
Small reference class warning: Some scenarios (Fiscal Dominance with n=3, Trade War with n=8 years) have inherently small reference classes. This is documented in the limitations register, and market-implied comparison provides an external anchor.
Evidence Model
New evidence shifts the scenario probability via Bayesian log-odds updating. The evidence model has six stages, each applying a specific statistical technique.
Metric Selection
Each scenario monitors 5–20 key indicators selected for their economic relevance to the scenario thesis. Metrics track first-differences (changes, not levels) — this is critical. Using levels creates spurious correlation from trending series; first-differences isolate genuinely new information.
Z-Score Computation
For each metric, compute the z-score of the recent change against a rolling 252-day standard deviation of changes. This captures how unusual the recent movement is relative to the metric's own volatility history. Requires at least 60 data points; fewer falls back to a threshold-proximity proxy.
Directional Alignment
Check whether the change is in the predicted direction for this scenario. A scenario predicting recession expects rising unemployment — a falling unemployment z-score would be counter-evidence, not supporting evidence. Misaligned z-scores reduce the probability rather than increase it.
Correlation Adjustment
Compute Pearson correlation between each pair's change series over the lookback window. Convert to effective degrees of freedom: eff_DoF = total_metrics / (1 + avg_correlation). If all 5 metrics are highly correlated, they collectively carry less information than 5 independent signals. This prevents double-counting.
Regime-Conditional Weighting
Some metrics have defined regime thresholds where their economic interpretation changes (e.g., unemployment below 4% vs. above 6%). When a metric is in a specific regime, its weight is adjusted via pre-defined multipliers. This captures non-linear relationships.
Log-Odds Bayesian Update
Convert prior probability to log-odds: LO = log(p / (1-p)). Compute log-Bayes-factor: z_score × direction × weight × calibration_constant × regime_modifier. Update: LO_new = LO_prior + log_BF. Convert back: p_new = 1 / (1 + exp(-LO_new)). Bounded between 5% and 95%.
LO_prior = log(p / (1 - p))
log_BF = Σ(z_i × dir_i × w_i) × calibration_constant × regime_mod / sqrt(eff_DoF)
LO_posterior = LO_prior + log_BF
p_posterior = clamp(1 / (1 + exp(-LO_posterior)), 0.05, 0.95)
The log-odds space ensures updates are symmetric: rising from 10% to 20% requires the same evidence strength as rising from 80% to 90%. The effective degrees-of-freedom divisor prevents correlated metrics from over-inflating updates.
Time Decay
Between updates, the probability decays toward the calibrated base rate. This prevents stale probabilities from persisting when evidence is absent.
The decay follows an exponential half-life: after one half-life period with no new evidence, the probability moves halfway back toward the base rate. The default half-life is 90 days.
p_decayed = base_rate + (p_current - base_rate) × exp(-days_since_update / half_life)
At 90 days without evidence, the probability has moved 50% of the way back to base rate. At 180 days, 75%. At 270 days, 87.5%. The probability asymptotically approaches but never exactly reaches the base rate.
Coherence Enforcement
Individual scenario probabilities are computed independently. But scenarios have structural relationships — some are mutually exclusive, some are reinforcing. After individual computation, a simultaneous iterative pass enforces these constraints. The iteration is order-independent: the same result regardless of which scenario is processed first.
Mutually Exclusive
Scenarios that cannot both materialise simultaneously. Example: Stagflation and Goldilocks. Joint probability ceiling enforced — if both drift up, the less-evidenced one is nudged down.
Reinforcing
Scenarios where one materialising increases the likelihood of another. Example: Trade War escalation reinforces Stagflation. Evidence supporting one provides a smaller boost to the other.
Conditional
Scenarios where one is a prerequisite for another. Conditional relationships inform the analysis prompt context. Automated conditional probability algebra is deferred — for small scenario sets, prompt-based reasoning is more robust than brittle algebraic coupling.
An automated coherence audit runs every 2 hours and after every article generation. It performs three checks:
- Probability coherence: mutually exclusive pairs must not exceed their joint ceiling; total active scenario probabilities must sum to less than 200%.
- Conditional coherence: when a scenario with conditional relationships shifts probability, linked scenarios should shift proportionally. Deviations exceeding 15 percentage points are flagged.
- Lifecycle event logging: every violation is recorded as a ScenarioLifecycleEvent with full context. Violations are surfaced on the dashboard — the system flags, but does not auto-correct, to preserve human oversight.
Heat Classification
Each scenario’s “heat” determines how frequently it is analysed and how prominently it appears in the interface. Heat is a composite of four signals:
Regime transitions detected by the Hamilton 2-state model are confidence-gated — a regime switch only fires when Hamilton confidence exceeds the configured threshold, preventing noise-driven regime flips from artificially escalating heat.
Evidence Strength
How strongly the watched metrics are currently signalling. Measured by the aggregate weighted z-score magnitude from the evidence model.
Threshold Proximity
How close key metrics are to economically significant thresholds — levels where the market interpretation changes. Proximity to a tipping point warrants more frequent monitoring.
News Intensity
Volume of relevant news events (from RSS pipeline) mentioning the scenario's key themes. Saturates at 10 articles/day to prevent noise-driven heat escalation.
DFM Divergence
When the Dynamic Factor Model's regime probability diverges from the Bayesian scenario probability by more than 15 percentage points, the scenario warrants closer attention. Capped at 0.3 to prevent domination by a single signal.
Scenario Radar
The Scenario Radar is the autonomous emergence detection system. Every 6 hours, it scans across 6 independent data source categories looking for macro configurations that don’t match anything currently tracked. The radar operates in regime-aware mode — during high-volatility environments (VIX > 25), detection thresholds tighten to filter out noise; during low-volatility environments (VIX < 15), they loosen to catch subtle signals that might otherwise be missed.
Signals from any single source are never sufficient. The radar requires corroboration across independent source categories before considering anything for promotion.
FRED Surprises
Z-scores on first-differences for 25 economic series across 6 categories (inflation, employment, growth, financial conditions, housing, trade). Each series has a per-series adaptive threshold weighted by macro informativeness — a high-yield spread move carries more weight than a housing start surprise.
Price Moves
N-sigma detection on 7-day returns across equities (SPY, QQQ, IWM), commodities (CL, GC, NG), rates (10Y, 30Y), FX (DXY, USDJPY), and volatility (VIX). Thresholds are regime-adjusted to prevent false positives during high-vol environments.
News Clustering
Quality-weighted analysis of NarrativeCluster formations. Source tier scoring (government > think tank > quality news > financial news > aggregators) combined with a claim specificity heuristic (specific numbers/dates score 1.0, directional language 0.5, vague sentiment 0.2).
CFTC Positioning
Z-score on net speculative position changes in energy, metals, rates, FX, and equity futures. Detects crowding and unwind risk before it appears in price.
Composite Indices
Proprietary indices — Convex Recession Probability (CVRP), Net Liquidity (CNLI), Risk Appetite (CRAI) — surfacing regime-level shifts not visible in individual series.
Cross-Source Divergences
When independent data sources disagree (e.g., bonds pricing recession while equities price expansion), the divergence itself is a signal. Detected automatically by the data resolver's built-in divergence tracking.
When the radar detects an initial signal, it activates cascade tracking — watching for the expected downstream sequence of market reactions. Finding the full cascade confirms the signal is real. Finding no downstream confirmation after the expected time window suggests noise. 7 predefined causal templates:
| Template | Expected Sequence | Window |
|---|---|---|
| Energy Shock | Commodity spike → geopolitical news → inflation expectations → equity repricing | 0–7 days |
| Credit Stress | Spread widening → vol spike → rate moves → financial news confirmation | 0–3 days |
| Geopolitical Escalation | News cluster → commodity reaction → volatility → FX adjustment | 0–3 days |
| Policy Surprise | Policy announcement → rate repricing → FX adjustment → equity response | 0–2 days |
| Trade War Escalation | Trade news → FX impact → equity repricing → inflation pass-through | 0–14 days |
| Carry Unwind | FX move → vol spike → equity drawdown → risk appetite shift | 0–3 days |
| Fiscal Stress | Rate move → financial conditions tightening → FX response → policy reaction | 0–7 days |
Every detection threshold is multiplied by a regime factor derived from VIX levels:
This prevents the radar from being overwhelmed with false positives during stress periods while remaining sensitive enough to catch early signals during calm markets.
Lifecycle Management
Scenarios are not permanent. They have a full lifecycle from emergence through active tracking to resolution or retirement. The system manages this lifecycle automatically, with human review gates at critical transitions.
New pattern detected by the Scenario Radar across multiple independent data sources. Tracked as an EmergenceSignal while evidence accumulates over subsequent radar runs.
Signal passes all 3 emergence gates (multi-source + novelty + actionability). Promoted to watchlist tier. Capacity-checked against maximum combined scenario cap (default: 12).
Reduced monitoring: news and signal accumulation every 48 hours. No article generation. Allows evidence to build without consuming full pipeline resources.
Full Bayesian pipeline: 4-component heat computation, evidence model, DFM divergence monitoring, article generation via 22-gate pipeline. Re-evaluated every cycle.
Probability below floor (5%) for 3 consecutive cycles, OR heat COLD for 3 consecutive computations with no material change for 30+ days. Scenario moves back to watchlist tier.
Structured resolution conditions (metric + operator + threshold + sustained days) are checked deterministically against live data. All conditions met → flagged for human review.
Scenarios exist in two tracking tiers that determine resource allocation:
- — Full 4-component heat computation every cycle
- — Bayesian evidence model with log-odds updating
- — 22-gate article generation pipeline
- — DFM divergence monitoring
- — Coherence enforcement against all relationships
- — News and signal accumulation only
- — Evaluated every 48 hours (not every cycle)
- — No article generation
- — Serves as evidence accumulation buffer
- — Auto-archives after 21 days of no signal activity
Capacity enforcement prevents scenario bloat: the system maintains a maximum of 8 active and 5 watchlist scenarios (12 combined cap). When at capacity, the lowest-priority watchlist candidate is demoted to make room for a stronger signal.
Each scenario has machine-checkable resolution conditions — not free-text descriptions, but structured rules that the system evaluates deterministically against live data:
{metric: "BAMLH0A0HYM2", operator: "below", threshold: 400, sustainedDays: 30}“High-yield spreads below 400 basis points for 30 consecutive days”
The sustainedDaysfield prevents premature resolution from a single-day spike or dip. The system tracks consecutive days the condition has been met, resetting to zero if the metric reverts. When all conditions on a scenario are satisfied, it is flagged for human review — the system does not auto-retire to prevent loss of context from premature closure.
Two independent probability systems run in parallel and cross-check each other:
Disciplined, calibrated. Updates via log-odds on hard data. Slow to move but statistically grounded. Anchored to historical base rates.
Creative, pattern-detecting. Generates fresh scenario probabilities every macro analysis cycle. Anchored to Bayesian probabilities to prevent wild numbers.
The LLM’s analysis is informed by the Bayesian probabilities (anchoring), and the Bayesian engine ingests the LLM’s novel pattern detection via emergence signals. Where the two disagree by more than 10 percentage points, that divergence is surfaced as a signal — it means either the data hasn’t caught up to the narrative, or the narrative is wrong. Both are worth investigating.
Scheduling Architecture
The scenario system runs on four independent pipelines, each scheduled at the cadence that matches its data source update frequency. Running pipelines more frequently than their data sources update wastes compute without improving signal quality.
| Pipeline | Cadence | Schedule | Rationale |
|---|---|---|---|
| Scenario Radar | Every 6 hours | 00:00, 06:00, 12:00, 18:00 UTC | FRED updates daily, news clusters need 4–8 hours to form, CFTC data is weekly. More frequent runs recheck stale data. 4 runs per day captures all meaningful signal windows without waste. |
| Article Generation | Every 6 hours | 01:00, 07:00, 13:00, 19:00 UTC | Self-gates via heat classification — CRITICAL scenarios may generate daily, COLD scenarios auto-skip. Offset 1 hour after radar to incorporate fresh signals. Internal cadence gates prevent over-publication. |
| Lifecycle Evaluation | Daily | 07:30 UTC | Demotion and resolution conditions track slow-moving metrics (multi-day sustained thresholds). Daily evaluation captures all transitions without redundant checks. Runs after morning radar cycle. |
| Coherence Audit | Every 2 hours | Continuous | Lightweight arithmetic check (no LLM calls). Catches probability drift and constraint violations quickly. Also triggers automatically after every article generation for immediate feedback. |
Pipelines are staggered to allow downstream consumers to incorporate upstream results:
22-Gate Article Pipeline
Every scenario article passes through 22 sequential gates before publication. This ensures only genuinely significant, novel, quality-validated analysis reaches the public site. Any gate failure stops the pipeline for that scenario.
Article Context Injection
When the article generation gate fires, the AI research desk receives a structured context payload containing every quantitative signal available. This is not a simple “write about X” prompt — it is a comprehensive data briefing that would take a human analyst hours to assemble.
- —Scenario definition: title, thesis, why we're tracking it, historical precedents
- —Watch metrics: current values, comparison to history, Bayesian evidence assessment per metric
- —DFM factor scores: current latent factor state + regime probability
- —CFTC positioning: net speculative, commercial net, percentile ranks
- —Polymarket: market-implied probabilities where available
- —Bilateral stress: relevant country-pair composites + velocity
- —Related scenarios: which other scenarios are linked (reinforcing, exclusive)
- —Derived signals: net liquidity, yield curve shape, real yields, inflation expectations, etc.
- —Recent macro data points from last 30 days
- —NVI: narrative velocity reading + top accelerating terms + sentiment convergence
Tunable Parameters
All model parameters are stored in a database configuration table, not hardcoded. Every change is logged with a justification and timestamp. This ensures full auditability and prevents undocumented tweaks.
| Parameter | Default | Purpose |
|---|---|---|
| Calibration Constant | 0.30 | Sensitivity of probability to z-score evidence. Higher = more reactive. |
| Half-Life (days) | 90 | Speed of decay toward base rate in absence of evidence. |
| Heat Weight: Evidence | 0.35 | Contribution of z-score strength to heat classification. |
| Heat Weight: Threshold | 0.30 | Contribution of threshold proximity to heat. |
| Heat Weight: News | 0.20 | Contribution of news volume to heat. |
| Heat Weight: DFM Divergence | 0.15 | Contribution of Dynamic Factor Model divergence to heat. Capped at 0.3. |
| News Saturation Point | 10 | Max articles/day before news contribution caps out. |
| Correlation Window | 252 days | Lookback for cross-metric correlation matrix. |
| Min Observations | 60 | Minimum data points for z-score computation. Below this, fallback to threshold proxy. |
| Probability Floor | 5% | No scenario can go below 5% — no false certainty of impossibility. |
| Probability Ceiling | 95% | No scenario can exceed 95% — no false certainty of inevitability. |
| Demotion Prob Floor | 5% | Probability below this for 3 consecutive cycles triggers demotion to watchlist. |
| Demotion Cycles | 3 | Consecutive below-floor cycles required before demotion. |
| Promotion Signal Count | 3 | Minimum accumulated emergence signals before promotion eligibility. |
| Promotion Min Days | 2 | Minimum days of accumulated signals before promotion (filters transient noise). |
| Max Active Scenarios | 8 | Maximum scenarios in active tier. |
| Max Watchlist | 5 | Maximum scenarios in watchlist tier. |
| Max Combined Cap | 12 | Hard cap on total active + watchlist scenarios. |
| Regime VIX Low | 15 | VIX threshold below which regime is classified as low-vol. |
| Regime VIX High | 25 | VIX threshold above which regime is classified as high-vol. |
| Cascade Confirmation | 0.6 | Fraction of required cascade steps that must be detected for confirmation. |
| News Cluster Quality Floor | 3.0 | Minimum quality-weighted score for a news cluster to register as a radar signal. |
Live parameter values and change history are available on the scenario methodology page.