sparse-attention
Everything on Ground Truth tagged “sparse-attention” — 1 item.
Sparse Attention Lesson
Sparse attention lets a transformer skip most of the pairwise comparisons between tokens, so instead of every token attending to every other token, each one attends to a chosen subset -- which is what makes million-token context windows affordable.