speculative-decoding

Everything on Ground Truth tagged “speculative-decoding” — 4 items.

The trick that makes AI type faster just hit the top of Hacker News News

A small model guesses ahead and a big model checks the work in parallel - and this week two efforts pushing that idea, DeepSeek's DSpark and JetSpec, lit up the front page while the community argued over whether it's truly 'lossless.'

Speculative Decoding: How AI Types Faster Without Changing a Word Lesson

A small, fast model guesses the next few words and a big, slow model checks them all in one pass - producing the exact same output, just quicker. The trick behind a lot of modern AI speedups.

JetSpec Tool

Parallel tree-drafting speculative decoding aiming for large, lossless inference speedups; project page and writeup with code, reporting up to several-times faster generation depending on the model and workload.

DeepSeek DSpark Tool

Open-source speculative-decoding implementation using parallel tree drafting to speed up text generation with no change to the model's output - the project that topped Hacker News this week. Drop-in inference speedups for self-hosted models.