News · 2026-06-26

Frontier AI is getting more expensive while open models keep getting cheaper

For years the story of AI pricing was a one-way slide downward: every few months the cost of using a capable model fell. That story is splitting in two. As an analysis from the inference company Doubleword lays out, the price of frontier closed models is now ticking upward while open-weight models, many of them Chinese, keep cutting theirs. The gap between the two worlds is widening, and access policy is widening it further.

The background. When you use a model through an API, you pay per token, roughly per chunk of text in and out. Two things set that price: how expensive the model is to run, and how much pricing power the provider has. For a while, competition and efficiency gains pushed prices down across the board. What is new is that the top closed models, the new GPT-5.6 flagship and Anthropic's Fable and Mythos line, are priced at a premium and, in some cases, gated behind government clearance. Meanwhile a Chinese open-weight model like DeepSeek's latest just had a permanent price cut, and you can download it and run it yourself for the cost of hardware.

What is actually happening, and why. Frontier labs are spending colossal sums on training and on the specialized chips to serve their models, and as their models pull ahead on the hardest tasks, they can charge for that lead. At the same time, the supply of strong open-weight models keeps growing, and open models compete on price because no single company controls them, anyone can host them, and hosts undercut each other. The result is a market splitting into a premium, gated tier and a cheap, abundant tier, with the middle hollowing out.

Doubleword's own angle illustrates one way the cheap tier gets cheaper: not every job needs an instant answer. The company offers async and batch processing, where you accept a wait, sometimes up to a day, in exchange for a steep discount. For workloads like running AI agents overnight, scoring thousands of evaluations, or bulk-processing documents, latency does not matter, so paying a premium for instant responses is pure waste.

An analogy. Think of shipping. Frontier closed models are overnight express: fast, premium-priced, and increasingly you need to be a verified account to even use the fastest tier. Open models on batch infrastructure are ground freight: slower, dramatically cheaper, and good enough for anything that is not on fire. The smart operator does not ship everything overnight; they reserve express for the few jobs that truly need it and send the rest by freight. As frontier express prices rise and gates go up, more cargo moves to freight.

Why it matters: this reversal, layered on top of government vetting for the best closed models, is pushing builders toward open weights, not as a budget compromise but as a strategy. If your application can run on a strong open model you host yourself, you are insulated from price hikes, rate limits, and the risk that a model you depend on gets switched off by a directive, as nearly happened with Mythos. For startups and researchers without a government clearance, the open tier is increasingly the only frontier they can actually reach.

The honest caveat: cheaper is not free, and open is not effortless. Doubleword is a vendor making a vendor's argument, and its cost comparisons are its own, so treat the specific multipliers as marketing until you measure your own workload. Running open models yourself means owning the hardware, the scaling, the reliability, and the security, a real operational burden that the per-token price hides. And the very best models, for now, still tend to live on the closed, premium, gated side of the line. The reversal is real and important, but it is a shift in the trade-offs, not a verdict that one side has won.

Primary source, verified: read the paper →