Skip to content
Tim Frenzel

// Insight

GPT-5 launches with an automatic reasoning router

4 min read
GPT-5reasoningrouting

OpenAI launched GPT-5 on Thursday. The most consequential design choice is not the capability claims. It is the plumbing. GPT-5 is not one model but a system: a fast, high-throughput model, a deeper reasoning model, plus a real-time router that decides which one answers you. The router weighs conversation type, complexity, tool needs, and explicit user intent. Underneath sit gpt-5-main and gpt-5-thinking with their mini variants, plus a nano tier and a pro tier with parallel test-time compute in the API.

The pitch is convenience. Users of earlier models had to pick between a quick GPT-4o-style answer and a slow o-series deliberation, and most picked wrong in one direction or the other. Now the system decides. For a consumer product that is probably the right call. For anyone running models inside a research or production pipeline, it moves a dial you used to own.

The dial you no longer hold

Start with cost and latency budgeting. With an explicit reasoning model, deliberation is a line item you choose. Routed, your spend per query becomes a distribution whose shape OpenAI controls and can change without telling you. The same prompt may burn a hundred tokens today and several thousand tomorrow because the router judged it differently. Latency inherits the same variance. A batch job that scored filings overnight at a predictable cost now carries routing risk on both dimensions.

Reproducibility is the deeper problem. When the router is invisible, the same prompt can take a different path on different days, leaving you unable to tell a model change from a routing change. Anyone who has chased a backtest discrepancy across data-vendor revisions knows this failure class: the inputs looked identical, the plumbing was not. A research pipeline that compares model outputs across weeks needs the path pinned, which in practice means calling the explicit API variants rather than the routed surface.

GPT-5: one entry point, a router picks the depth
QueryReal-time router: type, complexity, tools, intentgpt-5-main: fastgpt-5-thinking: deliberateResponse
The router rather than the user picks the path on every query.

The launch itself made the case better than any hypothetical. A day in, Sam Altman acknowledged that “the autoswitcher broke and was out of commission for a chunk of the day,” promising GPT-5 “will seem smarter starting today.” Read that as an operations note: the perceived intelligence of the flagship model dropped for a day because a routing component failed, and users had no way to see it. That is a new dependency class sitting between you and the weights.

Routing is fine. Opaque routing is not.

None of this is an argument against routing. Routing queries by difficulty is sound engineering that this archive has endorsed before: the Self-Route pattern from the RAG versus long-context study sends easy queries down a cheap path and escalates the hard minority, cutting cost at equal accuracy. The difference is ownership. Self-Route is a dial you build, observe, and tune against your own cost curve. GPT-5’s router is a dial someone else holds, adjusted on their schedule, with telemetry you do not get.

A desk parallel makes the stakes concrete. A smart order router that picks venues for your orders is valuable exactly up to the point where you stop seeing why it routed where it did. No execution desk would accept venue selection with no fill telemetry. Treat model routing the same way: take the convenience where outcomes are low-stakes, and demand the explicit path wherever you must explain a result later.

The practical playbook for now is short. Consumer-grade and internal-tooling use can ride the router and pocket the convenience. Anything feeding research, client output, or a regulated process should pin explicit variants, log which model answered, and treat any unexplained quality shift as possible routing drift rather than model drift.

Two ways to consume a routed flagship
Ride the routerConsumer chatInternal toolingLow-stakes draftsPin the variantResearch pipelinesClient outputRegulated processes
Convenience where outcomes are cheap; pinned, logged paths where answers must be explained.

The reasoning tier itself continues the trajectory o1 started: deliberation as priced compute. What changed this week is who decides when you pay for it.

GPT-5’s router trades user control for convenience: take the trade in low-stakes work, and pin the explicit model anywhere you must reproduce or explain the answer.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.