CONVEX
Methodology 04

Scenario Engine

Convex tracks macro scenarios with Bayesian probability estimates that update as new evidence arrives. Each scenario starts with a calibrated historical base rate, then evolves through a rigorous evidence model. A Scenario Radar discovers emerging risks autonomously across 6 data sources, while automated lifecycle management handles tiered tracking, demotion, resolution, and retirement. A 22-gate pipeline ensures published analysis meets institutional quality standards.

What Makes This Different

Most scenario analysis you’ll find — from banks, hedge funds, media — works like this: an analyst writes 3–4 scenarios in a PDF, assigns gut-feel probabilities, and publishes it. Six weeks later they might update it. Between updates, it’s static. There’s no feedback loop telling the analyst whether their last set of scenarios was any good.

Convex is building something fundamentally different: a unified, continuously updating, self-correcting scenario intelligence system that discovers emerging risks autonomously, validates them through multi-source evidence gating and causal chain tracing, maintains a coherent probability space across all tracked scenarios, and improves its own accuracy over time.

01

The Scenarios Are Alive

Most scenario analysis works like this: an analyst writes 3–4 scenarios in a PDF, assigns gut-feel probabilities, and publishes it. Six weeks later they might update it. Between updates, it’s static. The probabilities don’t move. New scenarios don’t appear. Old ones don’t retire.

Convex scenarios don’t sit in a PDF. Every 6 hours, the system re-evaluates every active scenario against fresh data — new FRED releases, price moves, news clusters, positioning data. Probabilities update continuously using Bayesian math (the same framework used in quantitative finance and intelligence analysis), not gut feel. If oil spikes 10% overnight, the Energy Supply Shock scenario’s probability adjusts before any human touches it.

02

Autonomous Scenario Discovery

Most platforms track a fixed list of scenarios someone wrote down. Convex runs a daily “radar” that scans across 6 independent data sources — economic data, market prices, news clustering, futures positioning, proprietary composite indices, and cross-source divergences — looking for emerging macro configurations that don’t match anything currently being tracked.

If something new is forming, the system spots it, accumulates evidence over multiple days to filter out noise, and promotes it to active tracking when the evidence is strong enough. No human has to notice it first.

03

Multi-Source Evidence Gating

When a new potential scenario emerges, it has to pass three gates before it gets tracked: confirmed by at least 2 independent data source types (not just one news story), assessed as genuinely novel rather than a variant of something already tracked, and verified to produce meaningfully different investment implications.

This is how intelligence agencies validate threats — multiple independent sources, not one loud signal. It separates signal from noise institutionally, not intuitively.

04

Causal Chain Tracing

The system has cascade templates — if oil spikes, it actively watches for the expected downstream sequence: energy news clustering within 24 hours, futures positioning shifts within 1–5 days, inflation expectation changes within 5–15 days.

Finding the full cascade confirms the signal is real. Finding no downstream confirmation after the expected window suggests it was noise. This is how experienced macro traders actually think — they trace transmission mechanisms — but nobody has automated it.

05

Automated Scenario Retirement

Each scenario has structured resolution conditions — specific, measurable thresholds (like “high-yield spreads below 400 basis points for 30 consecutive days”) that are checked automatically against live data.

When the conditions that made a scenario relevant are definitively gone, the system flags it for retirement. It also automatically demotes scenarios whose probability has been negligible for multiple cycles. This prevents zombie scenarios cluttering analysis long after they’ve stopped being relevant.

06

Dual-Brain Architecture

There’s an LLM (Claude) that’s creative and good at spotting novel patterns, and a Bayesian engine that’s disciplined and good at calibrated probability. They run in parallel.

The LLM’s scenario analysis is anchored to the Bayesian probabilities so it doesn’t invent wild numbers. The Bayesian engine ingests the LLM’s novel pattern detection so it doesn’t miss emerging configurations. Where the two disagree by more than 10 percentage points, that divergence is surfaced as an analytical signal — it means either the data hasn’t caught up to the narrative or the narrative is wrong.

07

Coherence Auditing

After every update, a coherence audit checks that the scenario set makes sense as a whole — mutually exclusive scenarios don’t sum to more than their ceiling, probability-weighted asset implications don’t produce logical contradictions, and conditional relationships are respected.

If the system says Credit Crisis probability rose but Fed Pivot probability didn’t shift accordingly, that gets flagged. No bank’s research desk does this automatically.

08

Self-Calibration

The calibration engine tracks whether probability assignments actually matched reality, whether asset price predictions were biased, which data sources were actually predictive for which types of scenarios, and whether the system was consistently early or late.

Those findings feed back into the system’s parameters. It literally gets smarter over time. Individual pieces of this exist in quantitative hedge funds, but they’re siloed. Nobody has it as one integrated system, and certainly nobody publishes it.

What you can’t get elsewhere

Individual pieces of this exist in quantitative hedge funds, but they’re siloed — the scenario team doesn’t talk to the data pipeline team doesn’t talk to the calibration team. Nobody has it as one integrated system, and certainly nobody publishes it.

Calibrated Base Rates

Every scenario starts with a calibrated base rate probabilityderived from a specific historical reference class. This is not a guess or a round number — it is the empirical frequency of similar conditions in the historical record, adjusted for current structural differences.

For example, a stagflation scenario might use the reference class “quarters with CPI >4% AND unemployment >5% since 1970”. The base rate is the fraction of such quarters that preceded a sustained stagflation episode.

Small reference class warning: Some scenarios (Fiscal Dominance with n=3, Trade War with n=8 years) have inherently small reference classes. This is documented in the limitations register, and market-implied comparison provides an external anchor.

Evidence Model

New evidence shifts the scenario probability via Bayesian log-odds updating. The evidence model has six stages, each applying a specific statistical technique.

01

Metric Selection

Each scenario monitors 5–20 key indicators selected for their economic relevance to the scenario thesis. Metrics track first-differences (changes, not levels) — this is critical. Using levels creates spurious correlation from trending series; first-differences isolate genuinely new information.

02

Z-Score Computation

For each metric, compute the z-score of the recent change against a rolling 252-day standard deviation of changes. This captures how unusual the recent movement is relative to the metric's own volatility history. Requires at least 60 data points; fewer falls back to a threshold-proximity proxy.

03

Directional Alignment

Check whether the change is in the predicted direction for this scenario. A scenario predicting recession expects rising unemployment — a falling unemployment z-score would be counter-evidence, not supporting evidence. Misaligned z-scores reduce the probability rather than increase it.

04

Correlation Adjustment

Compute Pearson correlation between each pair's change series over the lookback window. Convert to effective degrees of freedom: eff_DoF = total_metrics / (1 + avg_correlation). If all 5 metrics are highly correlated, they collectively carry less information than 5 independent signals. This prevents double-counting.

05

Regime-Conditional Weighting

Some metrics have defined regime thresholds where their economic interpretation changes (e.g., unemployment below 4% vs. above 6%). When a metric is in a specific regime, its weight is adjusted via pre-defined multipliers. This captures non-linear relationships.

06

Log-Odds Bayesian Update

Convert prior probability to log-odds: LO = log(p / (1-p)). Compute log-Bayes-factor: z_score × direction × weight × calibration_constant × regime_modifier. Update: LO_new = LO_prior + log_BF. Convert back: p_new = 1 / (1 + exp(-LO_new)). Bounded between 5% and 95%.

Bayesian Update Formula

LO_prior = log(p / (1 - p))

log_BF = Σ(z_i × dir_i × w_i) × calibration_constant × regime_mod / sqrt(eff_DoF)

LO_posterior = LO_prior + log_BF

p_posterior = clamp(1 / (1 + exp(-LO_posterior)), 0.05, 0.95)

The log-odds space ensures updates are symmetric: rising from 10% to 20% requires the same evidence strength as rising from 80% to 90%. The effective degrees-of-freedom divisor prevents correlated metrics from over-inflating updates.

Time Decay

Between updates, the probability decays toward the calibrated base rate. This prevents stale probabilities from persisting when evidence is absent.

The decay follows an exponential half-life: after one half-life period with no new evidence, the probability moves halfway back toward the base rate. The default half-life is 90 days.

Decay Formula

p_decayed = base_rate + (p_current - base_rate) × exp(-days_since_update / half_life)

At 90 days without evidence, the probability has moved 50% of the way back to base rate. At 180 days, 75%. At 270 days, 87.5%. The probability asymptotically approaches but never exactly reaches the base rate.

Coherence Enforcement

Individual scenario probabilities are computed independently. But scenarios have structural relationships — some are mutually exclusive, some are reinforcing. After individual computation, a simultaneous iterative pass enforces these constraints. The iteration is order-independent: the same result regardless of which scenario is processed first.

Mutually Exclusive

Scenarios that cannot both materialise simultaneously. Example: Stagflation and Goldilocks. Joint probability ceiling enforced — if both drift up, the less-evidenced one is nudged down.

Reinforcing

Scenarios where one materialising increases the likelihood of another. Example: Trade War escalation reinforces Stagflation. Evidence supporting one provides a smaller boost to the other.

Conditional

Scenarios where one is a prerequisite for another. Conditional relationships inform the analysis prompt context. Automated conditional probability algebra is deferred — for small scenario sets, prompt-based reasoning is more robust than brittle algebraic coupling.

Automated Coherence Audit

An automated coherence audit runs every 2 hours and after every article generation. It performs three checks:

  • Probability coherence: mutually exclusive pairs must not exceed their joint ceiling; total active scenario probabilities must sum to less than 200%.
  • Conditional coherence: when a scenario with conditional relationships shifts probability, linked scenarios should shift proportionally. Deviations exceeding 15 percentage points are flagged.
  • Lifecycle event logging: every violation is recorded as a ScenarioLifecycleEvent with full context. Violations are surfaced on the dashboard — the system flags, but does not auto-correct, to preserve human oversight.

Heat Classification

Each scenario’s “heat” determines how frequently it is analysed and how prominently it appears in the interface. Heat is a composite of four signals:

Regime transitions detected by the Hamilton 2-state model are confidence-gated — a regime switch only fires when Hamilton confidence exceeds the configured threshold, preventing noise-driven regime flips from artificially escalating heat.

35%

Evidence Strength

How strongly the watched metrics are currently signalling. Measured by the aggregate weighted z-score magnitude from the evidence model.

30%

Threshold Proximity

How close key metrics are to economically significant thresholds — levels where the market interpretation changes. Proximity to a tipping point warrants more frequent monitoring.

20%

News Intensity

Volume of relevant news events (from RSS pipeline) mentioning the scenario's key themes. Saturates at 10 articles/day to prevent noise-driven heat escalation.

15%

DFM Divergence

When the Dynamic Factor Model's regime probability diverges from the Bayesian scenario probability by more than 15 percentage points, the scenario warrants closer attention. Capped at 0.3 to prevent domination by a single signal.

Heat Levels & Cadence
CRITICAL
Daily
Highest attention, maximum article frequency
HOT
2–3 days
Active monitoring, frequent updates
WARM
Weekly
Standard surveillance
COLD
Bi-weekly
Background tracking

Scenario Radar

The Scenario Radar is the autonomous emergence detection system. Every 6 hours, it scans across 6 independent data source categories looking for macro configurations that don’t match anything currently tracked. The radar operates in regime-aware mode — during high-volatility environments (VIX > 25), detection thresholds tighten to filter out noise; during low-volatility environments (VIX < 15), they loosen to catch subtle signals that might otherwise be missed.

Signals from any single source are never sufficient. The radar requires corroboration across independent source categories before considering anything for promotion.

Signal Sources

FRED Surprises

Z-scores on first-differences for 25 economic series across 6 categories (inflation, employment, growth, financial conditions, housing, trade). Each series has a per-series adaptive threshold weighted by macro informativeness — a high-yield spread move carries more weight than a housing start surprise.

Price Moves

N-sigma detection on 7-day returns across equities (SPY, QQQ, IWM), commodities (CL, GC, NG), rates (10Y, 30Y), FX (DXY, USDJPY), and volatility (VIX). Thresholds are regime-adjusted to prevent false positives during high-vol environments.

News Clustering

Quality-weighted analysis of NarrativeCluster formations. Source tier scoring (government > think tank > quality news > financial news > aggregators) combined with a claim specificity heuristic (specific numbers/dates score 1.0, directional language 0.5, vague sentiment 0.2).

CFTC Positioning

Z-score on net speculative position changes in energy, metals, rates, FX, and equity futures. Detects crowding and unwind risk before it appears in price.

Composite Indices

Proprietary indices — Convex Recession Probability (CVRP), Net Liquidity (CNLI), Risk Appetite (CRAI) — surfacing regime-level shifts not visible in individual series.

Cross-Source Divergences

When independent data sources disagree (e.g., bonds pricing recession while equities price expansion), the divergence itself is a signal. Detected automatically by the data resolver's built-in divergence tracking.

Emergence Gates
1
Multi-Source Confirmation: Signal cluster must include signals from at least 2 distinct source categories. A single news story or one data point is never enough — this is how intelligence agencies validate threats.
2
Novelty Assessment: LLM evaluates the candidate against all active and watchlist scenarios across 5 dimensions: root cause, transmission mechanism, affected assets, policy response, and resolution conditions. Variants of existing scenarios are absorbed, not duplicated.
3
Actionability: Would this scenario require materially different portfolio positioning than anything currently tracked? If the investment implications are identical to an existing scenario, it does not earn a tracking slot.
Cascade Templates

When the radar detects an initial signal, it activates cascade tracking — watching for the expected downstream sequence of market reactions. Finding the full cascade confirms the signal is real. Finding no downstream confirmation after the expected time window suggests noise. 7 predefined causal templates:

TemplateExpected SequenceWindow
Energy ShockCommodity spike → geopolitical news → inflation expectations → equity repricing0–7 days
Credit StressSpread widening → vol spike → rate moves → financial news confirmation0–3 days
Geopolitical EscalationNews cluster → commodity reaction → volatility → FX adjustment0–3 days
Policy SurprisePolicy announcement → rate repricing → FX adjustment → equity response0–2 days
Trade War EscalationTrade news → FX impact → equity repricing → inflation pass-through0–14 days
Carry UnwindFX move → vol spike → equity drawdown → risk appetite shift0–3 days
Fiscal StressRate move → financial conditions tightening → FX response → policy reaction0–7 days
Regime-Adaptive Thresholds

Every detection threshold is multiplied by a regime factor derived from VIX levels:

Low Vol
VIX < 15
Multiplier: 0.8× (more sensitive)
Normal
VIX 15–25
Multiplier: 1.0× (baseline)
High Vol
VIX > 25
Multiplier: 1.5× (higher bar)

This prevents the radar from being overwhelmed with false positives during stress periods while remaining sensitive enough to catch early signals during calm markets.

Lifecycle Management

Scenarios are not permanent. They have a full lifecycle from emergence through active tracking to resolution or retirement. The system manages this lifecycle automatically, with human review gates at critical transitions.

Emergence

New pattern detected by the Scenario Radar across multiple independent data sources. Tracked as an EmergenceSignal while evidence accumulates over subsequent radar runs.

Promotion

Signal passes all 3 emergence gates (multi-source + novelty + actionability). Promoted to watchlist tier. Capacity-checked against maximum combined scenario cap (default: 12).

Watchlist

Reduced monitoring: news and signal accumulation every 48 hours. No article generation. Allows evidence to build without consuming full pipeline resources.

Active

Full Bayesian pipeline: 4-component heat computation, evidence model, DFM divergence monitoring, article generation via 22-gate pipeline. Re-evaluated every cycle.

Demotion

Probability below floor (5%) for 3 consecutive cycles, OR heat COLD for 3 consecutive computations with no material change for 30+ days. Scenario moves back to watchlist tier.

Resolution

Structured resolution conditions (metric + operator + threshold + sustained days) are checked deterministically against live data. All conditions met → flagged for human review.

Tiered Tracking

Scenarios exist in two tracking tiers that determine resource allocation:

Active Tier
  • — Full 4-component heat computation every cycle
  • — Bayesian evidence model with log-odds updating
  • — 22-gate article generation pipeline
  • — DFM divergence monitoring
  • — Coherence enforcement against all relationships
Watchlist Tier
  • — News and signal accumulation only
  • — Evaluated every 48 hours (not every cycle)
  • — No article generation
  • — Serves as evidence accumulation buffer
  • — Auto-archives after 21 days of no signal activity

Capacity enforcement prevents scenario bloat: the system maintains a maximum of 8 active and 5 watchlist scenarios (12 combined cap). When at capacity, the lowest-priority watchlist candidate is demoted to make room for a stronger signal.

Structured Resolution Conditions

Each scenario has machine-checkable resolution conditions — not free-text descriptions, but structured rules that the system evaluates deterministically against live data:

{metric: "BAMLH0A0HYM2", operator: "below", threshold: 400, sustainedDays: 30}

“High-yield spreads below 400 basis points for 30 consecutive days”

The sustainedDaysfield prevents premature resolution from a single-day spike or dip. The system tracks consecutive days the condition has been met, resetting to zero if the metric reverts. When all conditions on a scenario are satisfied, it is flagged for human review — the system does not auto-retire to prevent loss of context from premature closure.

Dual-Brain Architecture

Two independent probability systems run in parallel and cross-check each other:

Bayesian Engine

Disciplined, calibrated. Updates via log-odds on hard data. Slow to move but statistically grounded. Anchored to historical base rates.

LLM Analyzer

Creative, pattern-detecting. Generates fresh scenario probabilities every macro analysis cycle. Anchored to Bayesian probabilities to prevent wild numbers.

The LLM’s analysis is informed by the Bayesian probabilities (anchoring), and the Bayesian engine ingests the LLM’s novel pattern detection via emergence signals. Where the two disagree by more than 10 percentage points, that divergence is surfaced as a signal — it means either the data hasn’t caught up to the narrative, or the narrative is wrong. Both are worth investigating.

Scheduling Architecture

The scenario system runs on four independent pipelines, each scheduled at the cadence that matches its data source update frequency. Running pipelines more frequently than their data sources update wastes compute without improving signal quality.

PipelineCadenceScheduleRationale
Scenario RadarEvery 6 hours00:00, 06:00, 12:00, 18:00 UTCFRED updates daily, news clusters need 4–8 hours to form, CFTC data is weekly. More frequent runs recheck stale data. 4 runs per day captures all meaningful signal windows without waste.
Article GenerationEvery 6 hours01:00, 07:00, 13:00, 19:00 UTCSelf-gates via heat classification — CRITICAL scenarios may generate daily, COLD scenarios auto-skip. Offset 1 hour after radar to incorporate fresh signals. Internal cadence gates prevent over-publication.
Lifecycle EvaluationDaily07:30 UTCDemotion and resolution conditions track slow-moving metrics (multi-day sustained thresholds). Daily evaluation captures all transitions without redundant checks. Runs after morning radar cycle.
Coherence AuditEvery 2 hoursContinuousLightweight arithmetic check (no LLM calls). Catches probability drift and constraint violations quickly. Also triggers automatically after every article generation for immediate feedback.
Pipeline Sequencing

Pipelines are staggered to allow downstream consumers to incorporate upstream results:

T+0hScenario Radar — collect signals, detect emergence
T+1hArticle Generation — incorporates fresh radar signals
T+1.5hLifecycle Evaluation — daily, after first radar+article cycle
ContinuousCoherence Audit — every 2h + after each article generation

22-Gate Article Pipeline

Every scenario article passes through 22 sequential gates before publication. This ensures only genuinely significant, novel, quality-validated analysis reaches the public site. Any gate failure stops the pipeline for that scenario.

1–2
Selection & Guards: Is the scenario eligible for an article today? Is there already an article in-flight? (30-minute hold prevents duplicate generation.)
3
Cadence Check: Time since last article ≥ effective cadence for current heat level. CRITICAL scenarios: daily. HOT: 2–3 days. WARM: weekly. COLD: bi-weekly.
4
Material Change: Has any watched metric moved beyond the material change threshold since the last article? Prevents publishing articles when nothing has changed.
5
Novelty Check: Cross-scenario probability shift ≥ 5 percentage points? Equity markets moved ≥ 3%? Ensures the article has something genuinely new to analyse.
6
Deduplication: Has this specific topic been covered within 48 hours? Prevents rehashing the same thesis with minor variations.
7
Enrichment: Fetch DFM factor scores, CFTC positioning data, Polymarket prices, bilateral stress scores, and NVI readings. All injected as structured context.
8–9
Evidence & Probability: Run the full Bayesian evidence assessment. Update the scenario probability. Record the prior → posterior shift and which metrics drove the change.
10
Coherence: Enforce cross-scenario probability constraints. Ensure mutually exclusive scenarios don't exceed their joint ceiling.
11
Generation: Claude receives a structured prompt with: scenario definition, evidence assessment, DFM scores, CFTC positioning, Polymarket probabilities, bilateral stress, related scenarios, derived signals, and recent data points.
12–14
Validation: Word count (900–1500), editorial tone check, consistency with evidence, mechanism grounding, deduplication. Hard gates auto-reject; soft gates route to editorial review.
15–17
Enrichment Metadata: Attach model health metrics (DFM log-likelihood, regime confidence, Brier Skill Score), probability divergence vs. prediction markets, and conviction scoring (0–10).
18–22
Publication: Store Article + ArticleVersion, generate scenario probability history, update heat classification, trigger social media summaries, archive previous versions.

Article Context Injection

When the article generation gate fires, the AI research desk receives a structured context payload containing every quantitative signal available. This is not a simple “write about X” prompt — it is a comprehensive data briefing that would take a human analyst hours to assemble.

Context Payload
  • Scenario definition: title, thesis, why we're tracking it, historical precedents
  • Watch metrics: current values, comparison to history, Bayesian evidence assessment per metric
  • DFM factor scores: current latent factor state + regime probability
  • CFTC positioning: net speculative, commercial net, percentile ranks
  • Polymarket: market-implied probabilities where available
  • Bilateral stress: relevant country-pair composites + velocity
  • Related scenarios: which other scenarios are linked (reinforcing, exclusive)
  • Derived signals: net liquidity, yield curve shape, real yields, inflation expectations, etc.
  • Recent macro data points from last 30 days
  • NVI: narrative velocity reading + top accelerating terms + sentiment convergence

Tunable Parameters

All model parameters are stored in a database configuration table, not hardcoded. Every change is logged with a justification and timestamp. This ensures full auditability and prevents undocumented tweaks.

ParameterDefaultPurpose
Calibration Constant0.30Sensitivity of probability to z-score evidence. Higher = more reactive.
Half-Life (days)90Speed of decay toward base rate in absence of evidence.
Heat Weight: Evidence0.35Contribution of z-score strength to heat classification.
Heat Weight: Threshold0.30Contribution of threshold proximity to heat.
Heat Weight: News0.20Contribution of news volume to heat.
Heat Weight: DFM Divergence0.15Contribution of Dynamic Factor Model divergence to heat. Capped at 0.3.
News Saturation Point10Max articles/day before news contribution caps out.
Correlation Window252 daysLookback for cross-metric correlation matrix.
Min Observations60Minimum data points for z-score computation. Below this, fallback to threshold proxy.
Probability Floor5%No scenario can go below 5% — no false certainty of impossibility.
Probability Ceiling95%No scenario can exceed 95% — no false certainty of inevitability.
Demotion Prob Floor5%Probability below this for 3 consecutive cycles triggers demotion to watchlist.
Demotion Cycles3Consecutive below-floor cycles required before demotion.
Promotion Signal Count3Minimum accumulated emergence signals before promotion eligibility.
Promotion Min Days2Minimum days of accumulated signals before promotion (filters transient noise).
Max Active Scenarios8Maximum scenarios in active tier.
Max Watchlist5Maximum scenarios in watchlist tier.
Max Combined Cap12Hard cap on total active + watchlist scenarios.
Regime VIX Low15VIX threshold below which regime is classified as low-vol.
Regime VIX High25VIX threshold above which regime is classified as high-vol.
Cascade Confirmation0.6Fraction of required cascade steps that must be detected for confirmation.
News Cluster Quality Floor3.0Minimum quality-weighted score for a news cluster to register as a radar signal.

Live parameter values and change history are available on the scenario methodology page.