News · 2026-06-25

OpenAI designs its own chip to run its models

OpenAI has spent years renting its compute from other people's chips. This week, with the semiconductor company Broadcom, it announced something different: a chip of its own, reportedly nicknamed "Jalapeno," designed to do one specific thing well. The reporting comes from Ars Technica, in partnership with the chipmaker Broadcom.

To understand why this matters, it helps to know there are two very different jobs a chip does in the AI world. The first is training, where a model learns from enormous piles of data over weeks or months. The second is inference, which is what happens every single time you actually use the model, the split second where it reads your question and writes an answer. Training happens once. Inference happens billions of times a day, forever. For a company serving hundreds of millions of users, inference is the bill that never stops arriving.

Jalapeno is built only for that second job. It is what the industry calls an inference chip, tuned narrowly to run already-trained models as fast and as cheaply as possible. Think of it like the difference between the factory that designs and builds a car and the engine that runs in it every day. OpenAI isn't trying to build a do-everything chip to rival the general-purpose graphics processors that train models. It is trying to build the most efficient possible engine for the one task it pays for constantly.

The reason to do this yourself, rather than buy off the shelf, is control and margin. Today the lion's share of AI compute runs on chips from a single dominant supplier, which means that supplier sets the price and the waiting list. By designing its own chip, OpenAI can shape the silicon around the exact way its own models think, the specific math, the memory patterns, the way attention flows through a transformer, and it stops paying someone else's markup on every query. Broadcom is the right partner for this because it quietly builds the custom chips behind several of the big cloud companies' in-house accelerators; it has done this before.

The striking claim in the announcement is the speed of development. Building custom silicon usually takes years. OpenAI and Broadcom describe a roughly nine-month cycle, which is fast enough to raise eyebrows. The likely explanation is that they leaned heavily on Broadcom's existing building blocks and kept the chip's job narrow, an inference-only chip aimed at a known set of models is a far smaller design problem than a general processor.

Why it matters: this is the clearest sign yet that the frontier AI labs want to own their whole stack, from the silicon up. It lands the same week that Qualcomm agreed to buy the software-compiler company Modular, another move to control a layer of the AI pipeline that used to belong to someone else. The competitive battle in AI is shifting from "who has the smartest model" toward "who can run a good model for the least money," because once several labs have comparable models, cost per answer is what decides who wins. Owning the chip is a direct attack on that cost.

The honest caveat is that almost everything quantitative here is a vendor claim. OpenAI says the chip's efficiency, the amount of useful work it does per unit of electricity, is substantially better than the best available alternatives. That is exactly the kind of statement every chip company makes on announcement day, and there are no independent measurements yet. Custom inference chips are also famously easy to announce and hard to deploy at scale: by the time a chip tuned for today's models is running across a huge fleet, the models themselves may have changed shape. Until outside engineers can measure a real Jalapeno running a real workload, treat the performance story as a strong strategic signal rather than a proven result. What is not in doubt is the direction: the company that popularized renting AI compute now wants to make its own.

Primary source, verified: read the paper →