How Convex Works
Full transparency on how we produce institutional-grade macro intelligence. Every calculation is deterministic and auditable, no black-box ML, no unexplained adjustments. This document describes the complete system: from raw data ingestion through proprietary indices, factor models, and Bayesian scenario tracking to published analysis.
System Architecture
Data flows through six layers. Each layer is fully documented on its respective page.
FRED, GDELT, CFTC, Polymarket, RSS
Outlier detection, z-scores, first-differences, derived signals
PCA → EM → Kalman filter → regime detection
CVRP, CNLI, CRAI, NVI from live data estate
Bayesian updating, coherence enforcement, heat classification
22-gate quality pipeline → institutional-grade research articles
Deep Dives
Proprietary Indices
Four composite indicators, CVRP, CNLI, CRAI, and NVI, each synthesizing multiple proven signals into a single actionable reading. Covers recession probability, net liquidity, cross-asset risk appetite, and narrative velocity.
- , CVRP: 5-channel recession probability (yield curve, Sahm Rule, claims, credit, LEI)
- , CNLI: Fed balance sheet minus RRP minus TGA, actual market liquidity
- , CRAI: 5 cross-asset ETF ratio z-scores spanning equities, credit, sectors, EM, banks
- , NVI: Real-time narrative acceleration from 46 editorially diverse news sources
Data Pipeline
Over 300 live data sources ingested, validated, and transformed into a unified macro data estate. External feeds from FRED, GDELT, CFTC, Polymarket, and dozens of RSS sources, plus internally derived composite signals.
- , 300+ FRED economic series across rates, inflation, credit, liquidity, labor, activity
- , GDELT geopolitical event stream with 80+ trusted domain whitelist
- , CFTC Commitments of Traders: speculative positioning + percentile ranks
- , Polymarket prediction markets: real-time market-implied probabilities
- , Dozens of RSS feeds across editorial tiers (official government, think tanks, quality news, financial wires, aggregators) for narrative-velocity signal
Quantitative Models
A Dynamic Factor Model reduces the macro series universe to latent factors via UD Kalman filtering and EM estimation. Hamilton 2-state regime detection classifies normal vs. stress environments. A bilateral stress tensor quantifies real-time geopolitical risk from event data.
- , Dynamic Factor Model: PCA initialisation → multi-start EM → daily Kalman filter
- , Hamilton regime filter: Markov-switching 2-state model on factor scores
- , Bilateral stress tensor: CAMEO channel scoring across military, economic, diplomatic
- , Eigenvalue crisis monitor: tracks variance compression across principal components
- , Transfer entropy: directed information flow between macro series
Scenario Engine
The part nobody else has. Scenarios update every 6 hours against live data using Bayesian math, not gut feel. The system discovers new scenarios autonomously, validates them through multi-source evidence gating and causal chain tracing, retires dead ones automatically, and calibrates its own accuracy over time. An LLM and a Bayesian engine run in parallel and check each other.
- , Continuous 6-hour Bayesian probability updates, not static PDFs
- , Autonomous scenario discovery across 6 independent data sources
- , Causal cascade tracing: oil spike → news clustering → positioning shifts → inflation expectations
- , Dual-brain architecture: creative LLM + disciplined Bayesian engine, divergence surfaced as signal
- , 22-gate article pipeline: enrichment → evidence → probability → generation → validation
Editorial Standards
Hand-curated long-form analysis follows a verified-facts protocol. Every numerical claim in research-driven content is sourced to either a fact ledger (each entry has a verification URL) or a hand-reviewed comparison-pairs narrative. The catch test: for any number on the page, the writer must produce a URL or code reference. Estimates and "plausible" figures are not allowed.
- , Verified-facts ledger: ~130 sourced data points across April 2026 snapshots and historical anchors from 1973 forward
- , Cross-source verification: single-source claims are not enough; institutional/auditor-grade rigor required
- , Self-audit before push: every numerical claim re-checked against the ledger; fabrications caught and corrected pre-deploy
- , Editorial dual-attribution: AI-written by Convex Research Desk, human-edited by Ben Bleier (visible byline + JSON-LD editor field)
- , Live snapshot dating: every fact carries an as-of date so readers and AI assistants can judge freshness
- , Used by: 41 hand-curated scenario × asset crown-jewel pages at /what-happens-when/{slug}/{asset}
Most scenario analysis is static: an analyst writes scenarios in a PDF, assigns gut-feel probabilities, and revisits them weeks later. Convex scenarios are alive — updating every 6 hours against fresh data, discovering new scenarios autonomously, tracing causal chains to separate signal from noise, and calibrating their own accuracy over time. An LLM and a Bayesian engine run in parallel and check each other.
Read the full explainer →By the Numbers
Design Principles
Deterministic & Auditable
Every probability, score, and index value is computed by documented formulas with logged parameters. No unexplained model outputs. Every parameter change is recorded with a justification.
Quantitative Rigour
Bayesian log-odds updating, z-scores on first-differences (not levels), effective degrees-of-freedom correction for correlated metrics, UD Kalman factorisation for numerical stability. Built to survive institutional scrutiny.
Full Limitation Disclosure
Every model has known limitations. We document ours publicly, including severity ratings and mitigation strategies. We do not claim capabilities we do not have.
Multi-Source Triangulation
No single data source drives any conclusion. Indices combine 3-5 independent channels. Scenario evidence uses cross-metric correlation adjustment. Narrative analysis spans 7 editorial categories.
Graceful Degradation
Missing data does not break the system. Indices compute from available components. Evidence falls back to threshold proxies when z-scores lack history. Enrichment layers skip when external APIs are unavailable.
This methodology documentation is a living resource. It is updated as our models evolve. Questions or concerns can be directed to mail@convextrade.com.