Skip to content
Tim Frenzel

// Insight

Zero-shot foundation models beat the USDA on softs

3 min read
TSFMcommoditiesforecasting

Three months ago this archive’s longread on time-series foundation models delivered the negative verdict: zero-shot transfer fails on equity returns, badly. This study supplies the other half of the map. On agricultural commodity prices, zero-shot foundation models beat the conventional methods consistently, with Time-MoE improving on USDA benchmarks by 54.9% on wheat and 18.5% on corn. Same model class, opposite outcome. The difference between the two results is more useful than either alone.

The evaluation is clean enough to trust the direction. Five foundation models, Chronos, Chronos-2, TimesFM 2.5, Time-MoE, and Moirai-2, forecast four major agricultural commodities on USDA ERS monthly price data spanning 1997 through 2025, scored on recent data from 2017 to 2024 with the COVID dislocation excluded. The benchmarks are not strawmen: USDA’s season-average price forecasts incorporate futures prices, which makes them partially forward-looking. The foundation models beat the futures-informed USDA forecasts on three of four commodities while reading nothing but price history.

Time-MoE improvement vs USDA ERS benchmark, 2017-2024 ex-COVID (%)
Wheat54.9Corn18.5
Zero-shot, monthly USDA data; futures-informed USDA forecasts beaten on three of four commodities.

The reconciliation with December is the actual lesson. It is the one a desk should internalize before deploying either result. Equity returns are nearly efficient: the signal-to-noise ratio sits close to zero, there are no stable shapes for a pretrained model to transfer; a generic prior actively hurts. Commodity prices are a different statistical animal, autocorrelated levels with seasonality, supply-cycle structure, and slow mean reversion, exactly the repeating patterns that populate pretraining corpora. Zero-shot transfer works where shapes exist. The model class did not change between December and now; the data-generating process did, which is the entire story of where foundation models belong in finance.

Racing offers the precise version of the distinction. Forecasting a rival’s lap time is a fool’s errand, too much noise, too much strategy. Forecasting tyre degradation across a stint is routine engineering, because rubber wears on physics with repeatable structure. The teams that win know which of their two forecasting problems is which. Equity returns are the rival’s lap time; commodity price levels, at monthly frequency, wear like the tyre.

The efficiency question is the tradable angle, raised carefully. Beating a futures-informed forecast with a price-history-only model suggests either that the futures curve embeds risk premia and hedging-pressure distortions the model sidesteps, or that monthly agricultural prices are simply less efficient than equity desks assume. Both readings are plausible; neither is yet a strategy. A forecast-accuracy edge in percentage terms is not a P&L statement: monthly horizons mean slow capital turns, basis and roll costs eat level-forecast advantages, while the 2017-2024 ex-COVID window is one regime by commodity standards. The honest next step is the paper’s design run against tradable instruments, futures rather than ERS cash series, with costs in, the same gap the December evaluation flagged between paper-frame Sharpes and money.

Two details keep the note straight. The fourth commodity resisted, with the futures-informed USDA forecast holding its edge there, a reminder that within the asset class the shape argument varies crop by crop, storage regime by storage regime. And the winner being Time-MoE, a sparse mixture-of-experts forecaster, says the architecture trend in the broader TSFM evaluation carries to the domain where transfer actually works.

For a desk holding both results, the deployment map sorts itself by data process rather than by asset class fashion. Structured, autocorrelated series at monthly frequency, commodity levels, yield-curve points, volumes and flows, are candidates for zero-shot foundation models today, with the boring benchmark run beside them. Near-efficient return series stay with the December verdict: finance-native pretraining or nothing. The screening question for any new series takes one afternoon: does the thing have shapes, measured by autocorrelation structure and seasonality strength, before anyone debates which foundation model to download.

Zero-shot foundation models beat futures-informed USDA forecasts on three of four commodities, 54.9% better on wheat, because commodity prices have shapes and equity returns do not: deploy by data process, never by model fashion.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.