Skip to content
Tim Frenzel

// Insight

Conformal VaR under drift: a calibration layer the regulator can read

6 min read
VaRconformal-predictionrisk-validation

Every VaR model is wrong in the same direction at the same time: the calibration that looked perfect across the sample fails exactly when volatility regimes shift, which is when the number matters. The classical fixes are model-specific patches. Regime-weighted conformal risk control is the model-agnostic version: wrap whatever VaR forecaster you already run in a conformal calibration layer whose safety buffer is computed from historical errors, weighted by recency and by similarity to the current regime. The guarantee framework is distribution-free, the machinery retrofits onto an existing risk engine, while the paper is unusually honest about which half of its idea does the work.

The mechanics are a desk-friendly version of conformal prediction’s core move. Take the base forecaster’s historical errors, compute the buffer that would have achieved the target exceedance rate, then apply it forward. The two weighting schemes decide which history counts: exponential time decay favors recent errors, and regime-similarity weights favor periods whose features, volatility level, dispersion, trend, resemble today’s. Finite-sample coverage holds under weighted exchangeability, with explicit approximation bounds when regimes drift smoothly rather than jump. The target is the regulatory standard, 99% VaR, a 1% exceedance rate.

A conformal wrapper around the VaR engine you already run
Base VaR forecaster: GBDT quantile or historical simulationHistorical forecast errorsWeight by recency + regime similarityCalibrated safety buffer99% VaR with target exceedance held under drift
Model-agnostic by construction; the base engine stays, the calibration layer wraps it.

The table, including the part that argues against the title

The evaluation runs the CRSP value-weighted US equity index, daily, 2018 through 2024, 1,751 out-of-sample days spanning the COVID crash, the 2022 rate shock, and two quiet stretches. The uncalibrated gradient-boosted quantile base misses its target by a factor of five, a 5.31% exceedance rate against the 1% target. Every conformal variant repairs it: sliding-window to 1.37%, adaptive conformal to 1.14%, time-weighted to 1.09%, regime-weighted to 1.14%. The capital cost column is where the schemes separate, with time-weighted achieving its near-target coverage at an average VaR of 165 basis points against adaptive conformal’s 200.

Exc %VaR bpBase5.31146SWC1.37182ACI1.14200TWC1.09165RWC1.14155
GBDT base, 99% VaR target of 1% exceedance; calibration buys coverage, weighting buys efficiency.
The plain-spoken headline is that time-weighting alone is the strong default, hitting 1.09% at the lowest reasonable capital, while regime-weighting earns its keep on a different base and in a different place.

Wrap historical simulation, the estimator half the industry still runs, where the regime-weighted variant is the standout: 1.09% exceedance at 247 basis points of average VaR against the unweighted wrapper’s 426, target coverage at roughly forty percent less reserved capital. The pattern makes mechanical sense. A responsive base model already encodes recent conditions, leaving regime weights little to add; a sluggish base like historical simulation leaves exactly the regime information on the table that the similarity weights recover.

The result that checks the celebration sits in the volatility quintiles: in the top quintile, every method, weighted or not, runs hot at roughly 2.29% exceedance against the 1% target. Conformal wrappers compress the stress-regime failure; they do not eliminate it. The worst environments still produce more than double the licensed exceedances, which is the residual a risk committee should see quoted next to any adoption proposal, because a calibration layer marketed as solving stress coverage has not, on this evidence, solved it.

Where this sits in the validation stack

The pattern of the result will look familiar to anyone tracking this archive’s conformal thread. Uncertainty-adjusted sorting found in January that bounds built from misspecified uncertainty still improved portfolios; here, the simpler weighting beats the cleverer one on the responsive base. Distribution-free methods keep delivering their value through discipline rather than sophistication: any reasonable conformal wrapper fixes the catastrophic miscalibration, while the refinements trade single basis points. For adoption purposes that is good news, since the version a desk can explain to a regulator in one paragraph, weight recent errors more, captures most of the benefit.

The governance case is the strongest part. It is structural rather than statistical. A conformal layer is a documented, auditable transformation sitting between the model and the reported number: the base model’s raw output, the buffer applied, the weighting scheme, plus the running exceedance count are all inspectable quantities. That converts VaR calibration from a property someone asserts about a model into a process someone can examine around it, the same shift reasoning-trace checking brought to LLM outputs. Backtesting exceptions are the most regulator-scrutinized statistic a market-risk function produces; a wrapper that holds them near target under drift, at measurable capital cost, with the expected-shortfall-aware instincts the rates literature is converging on, is the rare ML-adjacent upgrade whose paper trail is better than the incumbent’s.

Two extensions decide whether the index result transfers to a real book. The evaluation runs one value-weighted index; a multi-desk book is a portfolio of correlated exposures whose joint tail behavior the univariate wrapper never sees; the sound book-level version calibrates on the portfolio’s own P&L series rather than wrapping each sleeve separately, with the cross-sectional dependence left to the base engine where it already lives. And the wrapper’s intermediate quantities deserve promotion from diagnostics to monitoring artifacts: the buffer’s time series is an early-warning indicator in its own right, since a calibration layer that suddenly demands a wider buffer is reporting that the base model’s recent errors have fattened, often days before the exceedance count confirms it. A risk function that charts the buffer beside the VaR gets the wrapper’s opinion of the model for free.

The theory’s fine print deserves one plain-language paragraph, because it locates the stress-quintile residual precisely. The coverage guarantee holds under weighted exchangeability, with approximation bounds that degrade gracefully when regimes drift smoothly. Crashes do not drift smoothly. A COVID-style jump is exactly the discontinuity the bounds do not cover, which is why every variant ran hot in the top volatility quintile: the method extrapolates from weighted history; a regime with no usable precedent in the lookback defeats any weighting of it. That is a limitation to state rather than a flaw to fix, since no distribution-free method can manufacture information about regimes it has never seen. The practical consequence is a division of labor: the conformal layer owns calibration through ordinary drift, while stress testing and scenario analysis keep owning the jumps, the same boundary they have always owned.

Who owns which failure
Conformal wrapperOrdinary driftWeighted history informs the bufferCoverage held near targetStress and scenarioRegime jumpsNo usable precedent in lookbackOwned by stress testing, as always
The wrapper calibrates through drift; discontinuities stay where they have always lived.

The pilot is cheap because the method is a wrapper. Run it shadow-mode beside the production VaR for two quarters, three configurations, unweighted, time-weighted, regime-weighted, on your own book rather than an index. Score exceedance rate, average capital, plus the stress-quintile residual. The January-to-March lesson of this whole literature applies once more: the discipline of any calibrated bound beats the sophistication of the bound you chose; the cheapest disciplined version that your committee fully understands is usually the right one to ship.

A conformal wrapper turns a five-times-over-target VaR into near-perfect average coverage at measured capital cost: time-weighting is the default that does the work, regime-weighting pays on sluggish bases, the stress quintile stays honest at double the target.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.