// Insight

Robust combinations for the yield curve: Nelson-Siegel meets the forest

February 14, 20264 min read

yield-curveforecastingrobust-ML

Rates desks have run Nelson-Siegel variants for decades because level, slope, and curvature are factors a committee understands. The ML literature keeps proposing replacements; this Treasury forecasting framework proposes a marriage instead. A factor-augmented dynamic Nelson-Siegel model and a random forest each forecast the curve; a distributionally robust combination layer fuses them with weights that penalize worst-case error rather than average error. The hybrid is the kind a rates desk can actually defend: the interpretable spine stays, the nonlinear learner contributes, the fusion rule is explicit about what it protects against.

The three components split the problem along its natural seams. The factor-augmented dynamic Nelson-Siegel keeps the classical level-slope-curvature dynamics and augments them with principal components from a panel of economic indicators, macro information entering through an interpretable channel. The random forest works the other side, nonlinear interactions among macro-financial drivers and lagged yields that no affine structure captures. The robust combination layer then aggregates many variants of both, weighting by expected shortfall of forecast error with ridge-regularized covariance, tuning the ensemble to fail gently in the tails rather than to shine on average.

Two model families, one downside-aware fusion

The fusion penalizes worst-case error; ten variants per family feed the combination.

The horizon structure of the results is the practitioner content. At the one-month horizon, the robust combinations are the clear winners: errors around 9.6 to 15.3 basis points at the two-year, 10.4 to 11.8 at the ten-year, and under 11 at the thirty-year, against 20 to 36 basis points for the factor-augmented Nelson-Siegel alone and 24 to 39 for the plain dynamic version. Stretch to longer horizons and the picture inverts. The Nelson-Siegel family deteriorates rapidly, exceeding 120 basis points beyond six months and reaching 94 to 143 at a year, while the random forest barely degrades at all, holding 13 to 25 basis points across the whole one-to-twelve-month range. No single model owns the curve: adaptive combinations win the short horizon, the forest owns the long one, the term structure of model choice is itself the finding.

RMSFE at one month, best robust combination (bps)

Against 20-36 bps for FADNS alone and 24-39 for plain DNS at the same horizon.

That crossover deserves the desk’s attention more than any single number. Forecast-combination weights that depend on horizon are telling you the data-generating process looks different at different distances: near-term yields move with dynamics the structured model tracks, while year-ahead yields are dominated by slow-moving macro relationships the forest’s interactions capture and the affine recursion compounds errors trying to extrapolate. A desk that runs one model across all horizons is implicitly averaging over that regime difference. The framework’s honest answer is a model schedule rather than a model.

The expected-shortfall weighting is the detail that makes this committee-grade. Standard combination weights minimize mean squared error, which happily accepts occasional disasters in exchange for average polish. Weighting by the tail of the error distribution builds the risk preference into the forecast layer itself, the same downside-first instinct that separates a robust evaluation from a flattering one. For a function that feeds hedging decisions and rate-risk limits, a forecaster tuned to its own worst case is the one whose failure modes are at least pointed in the documented direction.

The caveats are the standard ones for forecasting papers. Errors are statistical, with no trading or hedging P&L attached, and basis-point accuracy does not translate linearly into economics once positioning and costs enter. The global-sovereign extension is reported as confirming stability without detailed numbers, which leaves the US result as the load-bearing one. And ensembles of twenty models carry real operational weight, twenty things to refit, monitor, and explain when the combination shifts its weights, which is exactly the moment a committee will ask why.

The note-sized verdict: this is what credible ML adoption looks like on a rates desk. Keep the structure regulators and committees already trust, add the learner where nonlinearity demonstrably pays, and make the combination rule carry the risk preference explicitly. The 10-basis-point one-month numbers are good; the horizon-dependent model schedule and the worst-case-aware fusion are the parts worth copying into any curve shop’s stack, enough rate regimes having taught most of us that the tails are where forecasting reputations actually get settled.

Nelson-Siegel keeps the short horizon honest, the forest owns the long one, an expected-shortfall-weighted combination arbitrates: the term structure of model choice is the real result, built to fail gently.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.

Get in touch Read the book →