Learn · Beginner
What is a context window?
Every time you talk to an AI model, there's a hard limit on how much text it can consider at once — the conversation so far, the documents you've pasted, the instructions it was given. That limit is the context window, measured in tokens (chunks of text, very roughly a word or so each). Think of it as the model's working memory: anything inside the window, it can use; anything that falls outside, it simply cannot see. Understanding this one concept explains a surprising amount about why models behave the way they do.
Why there's a limit at all
Modern language models are built on the Transformer, introduced in Attention Is All You Need. Its key mechanism, attention, lets the model weigh how much every piece of text should care about every other piece. That's powerful, but it has a cost: in the basic design, comparing every token to every other token means the work grows roughly with the square of the length. Double the text and you roughly quadruple the effort. That quadratic cost is the wall that historically kept context windows small.
A lot of clever engineering has gone into pushing the wall back. Longformer showed you don't need every token to attend to every other one — you can use sparser patterns and still capture what matters, making long documents affordable. And techniques for telling the model where each token sits in the sequence, like the rotary position embeddings introduced in RoFormer, turned out to extend gracefully to far longer inputs than they were trained on. Advances like these are why a model today can hold a few hundred thousand words at once — enough to swallow a whole book or a large codebase in a single go.
An analogy
A context window is like the desk you're working at. A small desk forces you to keep swapping papers in and out, losing track of what you set aside. A huge desk lets you spread every document out and see them all together. But — and this is the catch — a bigger desk doesn't automatically mean you read everything on it carefully. You still tend to focus on what's right in front of you and let the stuff in the far corners blur.
Long window ≠ good memory
This is the most important and least appreciated point. A model having room for a long document doesn't mean it actually uses all of it well. The Lost in the Middle study found a striking pattern: models reliably use information at the very beginning and very end of a long context, but often miss details buried in the middle — like a reader who skims the center of a long report. So a giant context window is a real capability, but "it fits" and "it was understood" are different claims.
It's also not the same as persistent memory. The window resets between sessions, and even within one task, models can lose the thread of what's no longer on screen — a limitation that shows up vividly in world models that forget what's off-frame. True long-term memory usually has to be bolted on separately, by storing information outside the model and retrieving the relevant bits back into the window when needed.
Why it matters
The context window sets the ceiling on what a model can do in one shot: how big a document it can summarize, how much code it can reason about, how long a conversation stays coherent. Growing it has unlocked genuinely new uses — feeding a model your entire contract instead of chopping it into fragments. But the marketing number ("a million tokens!") oversells the reality. The honest way to read a big context window: it's the size of the desk, not a guarantee that everything on it gets read. When it matters, test whether the model actually used the part you care about — especially if it was sitting in the middle.