long-context
The KV cache: why AI gets slower and hungrier the longer it talks Lesson
The hidden notebook that lets a model avoid re-reading every previous word - and the single biggest reason long context is expensive.
DeepSeek's new open models give everyone a million-word memory by default News
DeepSeek previewed two free-to-download V4 models that can read a million tokens at once, no longer as a premium add-on but as the standard setting.
What is a context window? Lesson
A model's context window is how much text it can hold in mind at once — its working memory. Bigger is useful, but a long window isn't the same as a good memory. Here's how it works and where it breaks.
MiniMax-M3 Tool
A natively multimodal open model trained on text, image, and video from the first step, with a million-token context and a sparse-attention design built for speed; downloadable for self-hosting and also offered through MiniMax's own API and agent platform.
GLM-5.2 Tool
A flagship openly-available language model with a very large context window for long documents and code. Free to download and run yourself, with compressed versions for more modest hardware.
DeepSeek-V4 (Pro & Flash) Tool
Two newly previewed open-weight models with a 1-million-token context window on by default - a large mixture-of-experts flagship and a smaller, fast everyday model. Downloadable weights plus an API.