News · 2026-06-25

A language model that writes by erasing, and now keeps up with the classics

Almost every AI chatbot you have used writes the same way: one word after another, strictly left to right, each new word chosen based on everything written so far. It is a bit like speaking out loud with no ability to go back, once a word is out, it is committed. This approach, called autoregression, has powered the entire chatbot era and works remarkably well. But there has always been a different idea waiting in the wings, and a newly released model called iLLaDA just gave it some of its strongest evidence yet, described in a paper on arXiv with weights and code released for anyone to use.

The alternative borrows its trick from the AI that makes images. Picture-generating models start with a field of pure noise and refine it, step by step, into a coherent image, sharpening the whole canvas at once rather than painting one pixel at a time. iLLaDA does the language version of this. Instead of writing left to right, it starts with a passage where many words are blank, hidden behind a kind of mask, and then fills them in over several passes, refining the whole passage together. This family of models is called diffusion language models, and the appeal is easy to feel: a writer who can see the whole draft at once and revise any part of it should, in principle, be better at planning ahead and at fixing the middle of a sentence after seeing the end.

For years the catch was that this approach did not scale. It was a charming research curiosity that fell behind the left-to-right models as soon as the stakes got serious. iLLaDA is an attempt to close that gap, and the name is literal: it is the improved successor to an earlier model called LLaDA. It is an eight-billion-parameter model, a respectable size, trained from scratch on an enormous amount of text using the diffusion recipe all the way through, never falling back on the usual left-to-right method. The headline result is that it not only improves broadly over its predecessor across general knowledge, math, and coding tasks, but it stays competitive with a strong, similarly sized conventional model. In other words, writing by refinement is no longer obviously the weaker choice at this scale.

Why this matters: for the whole modern era of AI, the field has essentially placed one giant bet, that left-to-right prediction is the road to capable language models. iLLaDA is evidence that there is a second viable road, and viable roads are valuable even when the first one is working, because they tend to be good at different things. The researchers argue their approach has natural advantages for reasoning that runs both forward and backward, for planning over long stretches, and for squeezing more out of limited data, since it can revisit the same material from many angles rather than reading it once front to back. A field with two healthy architectures instead of one is a field with more room to improve. It is the same spirit as earlier diffusion results we covered, like the open model that writes by refining a whole draft at once and the demonstration of text that arrives all at once.

The honest caveat: 'competitive with a strong conventional model' is the kind of claim that needs careful reading. The comparison only means something if both models were trained with similar amounts of computing power and data, an apples-to-apples match rather than a flattering pairing, and that is exactly the detail to scrutinize before declaring the gap closed. Independent groups reproducing the result in their own hands is what would turn this from a promising paper into a settled fact. It is also worth being clear about what 'competitive' is and is not. It is not 'better than the best models in the world.' It is 'this overlooked approach can hang with a serious peer at the same weight class,' which after years of the diffusion idea trailing badly is a genuinely meaningful turn, and worth watching to see whether the road keeps climbing.

Primary source, verified: read the paper → (arXiv 2606.25331)