CONVEX

How Convex Works

Full transparency on how we produce institutional-grade macro intelligence. Every calculation is deterministic and auditable, no black-box ML, no unexplained adjustments. This document describes the complete system: from raw data ingestion through proprietary indices, factor models, and Bayesian scenario tracking to published analysis.

System Architecture

Data flows through six layers. Each layer is fully documented on its respective page.

01Ingestion

FRED, GDELT, CFTC, Polymarket, RSS

02Transformation

Outlier detection, z-scores, first-differences, derived signals

03Factor Modelling

PCA → EM → Kalman filter → regime detection

04Index Computation

CVRP, CNLI, CRAI, NVI from live data estate

05Scenario Intelligence

Bayesian updating, coherence enforcement, heat classification

06Analysis Generation

22-gate quality pipeline → institutional-grade research articles

Deep Dives

01

Proprietary Indices

Four composite indicators, CVRP, CNLI, CRAI, and NVI, each synthesizing multiple proven signals into a single actionable reading. Covers recession probability, net liquidity, cross-asset risk appetite, and narrative velocity.

  • , CVRP: 5-channel recession probability (yield curve, Sahm Rule, claims, credit, LEI)
  • , CNLI: Fed balance sheet minus RRP minus TGA, actual market liquidity
  • , CRAI: 5 cross-asset ETF ratio z-scores spanning equities, credit, sectors, EM, banks
  • , NVI: Real-time narrative acceleration from 46 editorially diverse news sources
02

Data Pipeline

Over 300 live data sources ingested, validated, and transformed into a unified macro data estate. External feeds from FRED, GDELT, CFTC, Polymarket, and dozens of RSS sources, plus internally derived composite signals.

  • , 300+ FRED economic series across rates, inflation, credit, liquidity, labor, activity
  • , GDELT geopolitical event stream with 80+ trusted domain whitelist
  • , CFTC Commitments of Traders: speculative positioning + percentile ranks
  • , Polymarket prediction markets: real-time market-implied probabilities
  • , Dozens of RSS feeds across editorial tiers (official government, think tanks, quality news, financial wires, aggregators) for narrative-velocity signal
03

Quantitative Models

A Dynamic Factor Model reduces the macro series universe to latent factors via UD Kalman filtering and EM estimation. Hamilton 2-state regime detection classifies normal vs. stress environments. A bilateral stress tensor quantifies real-time geopolitical risk from event data.

  • , Dynamic Factor Model: PCA initialisation → multi-start EM → daily Kalman filter
  • , Hamilton regime filter: Markov-switching 2-state model on factor scores
  • , Bilateral stress tensor: CAMEO channel scoring across military, economic, diplomatic
  • , Eigenvalue crisis monitor: tracks variance compression across principal components
  • , Transfer entropy: directed information flow between macro series
04

Scenario Engine

The part nobody else has. Scenarios update every 6 hours against live data using Bayesian math, not gut feel. The system discovers new scenarios autonomously, validates them through multi-source evidence gating and causal chain tracing, retires dead ones automatically, and calibrates its own accuracy over time. An LLM and a Bayesian engine run in parallel and check each other.

  • , Continuous 6-hour Bayesian probability updates, not static PDFs
  • , Autonomous scenario discovery across 6 independent data sources
  • , Causal cascade tracing: oil spike → news clustering → positioning shifts → inflation expectations
  • , Dual-brain architecture: creative LLM + disciplined Bayesian engine, divergence surfaced as signal
  • , 22-gate article pipeline: enrichment → evidence → probability → generation → validation
05

Editorial Standards

Hand-curated long-form analysis follows a verified-facts protocol. Every numerical claim in research-driven content is sourced to either a fact ledger (each entry has a verification URL) or a hand-reviewed comparison-pairs narrative. The catch test: for any number on the page, the writer must produce a URL or code reference. Estimates and "plausible" figures are not allowed.

  • , Verified-facts ledger: ~130 sourced data points across April 2026 snapshots and historical anchors from 1973 forward
  • , Cross-source verification: single-source claims are not enough; institutional/auditor-grade rigor required
  • , Self-audit before push: every numerical claim re-checked against the ledger; fabrications caught and corrected pre-deploy
  • , Editorial dual-attribution: AI-written by Convex Research Desk, human-edited by Ben Bleier (visible byline + JSON-LD editor field)
  • , Live snapshot dating: every fact carries an as-of date so readers and AI assistants can judge freshness
  • , Used by: 41 hand-curated scenario × asset crown-jewel pages at /what-happens-when/{slug}/{asset}
The interesting sauce

Most scenario analysis is static: an analyst writes scenarios in a PDF, assigns gut-feel probabilities, and revisits them weeks later. Convex scenarios are alive — updating every 6 hours against fresh data, discovering new scenarios autonomously, tracing causal chains to separate signal from noise, and calibrating their own accuracy over time. An LLM and a Bayesian engine run in parallel and check each other.

Read the full explainer →

By the Numbers

300+
FRED economic series
4
Proprietary indices
22
Quality gates per article
80+
Tracked narrative terms
6
Geopolitical pillars
10
Derived macro signals

Design Principles

Deterministic & Auditable

Every probability, score, and index value is computed by documented formulas with logged parameters. No unexplained model outputs. Every parameter change is recorded with a justification.

Quantitative Rigour

Bayesian log-odds updating, z-scores on first-differences (not levels), effective degrees-of-freedom correction for correlated metrics, UD Kalman factorisation for numerical stability. Built to survive institutional scrutiny.

Full Limitation Disclosure

Every model has known limitations. We document ours publicly, including severity ratings and mitigation strategies. We do not claim capabilities we do not have.

Multi-Source Triangulation

No single data source drives any conclusion. Indices combine 3-5 independent channels. Scenario evidence uses cross-metric correlation adjustment. Narrative analysis spans 7 editorial categories.

Graceful Degradation

Missing data does not break the system. Indices compute from available components. Evidence falls back to threshold proxies when z-scores lack history. Enrichment layers skip when external APIs are unavailable.

Related

This methodology documentation is a living resource. It is updated as our models evolve. Questions or concerns can be directed to mail@convextrade.com.