attention

Everything on Ground Truth tagged “attention” — 5 items.

The KV cache: why AI gets slower and hungrier the longer it talks Lesson

The hidden notebook that lets a model avoid re-reading every previous word - and the single biggest reason long context is expensive.

DeepSeek's new open models give everyone a million-word memory by default News

DeepSeek previewed two free-to-download V4 models that can read a million tokens at once, no longer as a premium add-on but as the standard setting.

Transformers: the engine inside almost every modern AI Lesson

The neural-network design behind GPT, Claude, and nearly every modern AI model, and the one idea, attention, that made it work.

A small but elegant idea: putting 'experts' inside the attention layer News

Grouped Query Experts brings the mixture-of-experts trick into attention, activating only half a model's query heads per token while matching the full version -- at least at small scale.

A Classic Efficiency Trick Just Moved Into a New Part of the AI News

For years, the committee-of-specialists design that keeps big models fast lived in one layer of the network. A clean new result shows it works in the attention layer too, halving some of the work for free.