Skip to content
Tim Frenzel

// Insights

Long reads

The pillar pieces of this archive: field guides and deep evaluations, each built to stand on its own as the one piece to read on its subject. Ten to fifteen minutes apiece, several original figures each, evidence attached throughout.

← All insights

20 min read

Building the agentic enterprise: a field guide

Everyone is shipping agents and most will stall after the demo. The architecture that holds up in production, and the pre- and post-launch discipline that decides which agents survive, with the evidence attached.

10 min read

DeepSeek-V4: a million tokens of context, on weights you can own

Two MIT-licensed MoE models, V4-Pro at 1.6T parameters and V4-Flash at 284B, ship with 1M-token default context and a production sparse-attention design. For document-heavy quant work that cannot leave the building, the cost calculus just moved again.

10 min read

Agentic reasoning, unified: a map for deciding where agents belong

A 29-author survey organizes agentic reasoning into three layers, foundational, self-evolving, and collective, and splits inference-time orchestration from post-training optimization. The taxonomy doubles as a decision tool for where agentic loops help a research desk and where they multiply p-hacking.

10 min read

Time-series foundation models in finance: what transfers and what does not

The first comprehensive test of TimesFM and Chronos on 18 million daily returns answers the question every quant has been asking: zero-shot transfer fails outright, finance-native pretraining recovers most of the gap, and a tuned gradient-boosted tree still wins on fit.

10 min read

GDPval: measuring models against working professionals

OpenAI's benchmark grades frontier models against real deliverables from professionals averaging 14 years of experience. The best model wins or ties 47.6% of blind comparisons. What that number means, and how to build your own version, matter more than the headline.

10 min read

Kronos: a foundation model for candlesticks, and the scrutiny it invites

Kronos applies the language-model recipe to market data: tokenize 12 billion candlesticks, train a decoder to predict the next one, read off forecasts. The zero-shot numbers are large. The quant's job is to ask the questions a benchmark cannot answer, about leakage, regime, and whether forecast skill survives the cost of trading on it.

11 min read

AlphaEvolve: automated discovery, and why the evaluator is the whole game

AlphaEvolve pairs Gemini with an automated evaluator in an evolutionary loop and finds things people missed, including a 4x4 matrix-multiplication algorithm better than any since 1969. For a quant the template is automated strategy discovery, and the lesson is severe: the loop optimizes your evaluator with superhuman efficiency, leaks included.

10 min read

Does RL really incentivize reasoning? A caution for the backtest

A sober study finds RL makes reasoning models better at the first try without expanding what they can ultimately solve. The quant analogy is exact: do not mistake variance reduction for alpha, in a model or in a trading agent.

10 min read

Granular metric extraction from filings: traceability and verification beyond summarization

Clients want an agent that reads the 10-K and returns the number. Extraction, not summarization, is the hard part, and benchmarks say models fail it more than half the time. The build guide for doing it with a source on every figure and a verification gate.

10 min read

The transformer enters the SDF: complexity wins asset pricing

Kelly and coauthors implant a transformer in the stochastic discount factor and report an out-of-sample Sharpe of 4.57 against 1.77 for the best classical factor model, on sixty years of US stocks. The companion theory says why: in pricing, more factors keep winning.

10 min read

DeepSeek-R1: frontier reasoning goes open

R1 matches OpenAI's o1 on hard math and code, ships openly, and distills into small models you can host. Why the distillation result, not the benchmark parity, is what changes build-vs-buy for a quant desk.

10 min read

RAG for financial documents: a field guide

Grounding an LLM in your own filings is hard because retrieval, not the model, is the bottleneck. The proven moves that fix it, each with the evidence attached, and the discipline that makes the result safe to use.

11 min read

Model Context Protocol: the integration layer finally gets a standard

Anthropic's MCP is an open protocol that lets any model reach any data source or tool through one interface. Why a standard, modeled on LSP, is what a quant platform's integration layer has been missing.

10 min read

OpenAI Swarm: a teaching toy with a lesson worth stealing

Swarm is an experimental, MIT-licensed framework built on two primitives, agents and handoffs. It is not for production. The handoff pattern, though, is the right mental model for a research-agent stack.

10 min read

Llama 3.1 405B: a frontier model you can run behind your own firewall

Meta's 405B is the first openly available model that matches the closed frontier on knowledge, math, and code. Why that changes the build-vs-buy math for a quant desk that cannot send data to an API.

11 min read

Kolmogorov-Arnold Networks for time series: a volatility model a risk committee can read

On real implied-volatility data, T-KAN matches an LSTM with about sixty times fewer parameters and stays interpretable. The result, the architecture, and where the story gets oversold.

// Stay close to the work

Building AI that ships?

If you’re past the demo and into production, I’d love to compare notes.