sparse-attention

Everything on Ground Truth tagged “sparse-attention” — 1 item.

Sparse Attention Lesson

Sparse attention lets a transformer skip most of the pairwise comparisons between tokens, so instead of every token attending to every other token, each one attends to a chosen subset -- which is what makes million-token context windows affordable.