Ground Truth.
AI, checked against the source.

News · 2026-06-20

An openly-released text model that writes by refining, not word-by-word

Almost every language model you've used writes the same way: one word at a time, left to right, each word chosen based on everything before it. It's a bit like speaking without ever being able to go back and revise — once a word is out, it's committed. This approach, called autoregression, has powered the entire chatbot era. But there's a long-running alternative idea, and a new openly-released model just pushed it to a serious size.

The model is called Sumi, and it's a diffusion language model. To understand what that means, it helps to borrow from image generation. AI image models like the ones behind today's art tools don't paint a picture stroke by stroke; they start with random noise and gradually refine the whole image at once, sharpening it over many passes until a coherent picture emerges. Diffusion language models do the same thing with text: rather than committing to words one at a time, they start with a rough, garbled draft of the entire passage and repeatedly clean it up, all positions at once, until fluent text appears.

Why would anyone want this? The appeal is revision. Because a diffusion model works on the whole passage simultaneously and refines it over multiple passes, it can in principle go back and fix earlier words in light of later ones — something a strict left-to-right model can never do. That opens the door to a kind of self-correction that's awkward for conventional models, and it also allows generating many parts of the text in parallel rather than strictly in sequence, which could be faster in some setups. For years this remained mostly a research curiosity, demonstrated at small scale and rarely with openly available weights.

What makes Sumi notable is the combination of scale and openness. It's a genuinely mid-sized model — in the range of capable open models people actually run — trained from scratch on an enormous amount of text, and its creators at Tohoku NLP released it fully openly: the weights, not just a paper. The model weights are on Hugging Face and the code is on GitHub. That's the part that moves the field. Researchers and tinkerers can now download a real, non-trivial diffusion language model and study how it behaves, where it shines, and where it breaks — rather than taking a lab's word for it. Open releases like this are how a niche idea gets a fair, broad test.

An analogy for the two styles: an autoregressive model is a speaker giving a live, unscripted talk — fluent, but unable to un-say anything. A diffusion model is a writer with a full draft and an eraser, sweeping over the whole page again and again, tightening a phrase here, fixing an earlier word there, until the whole thing reads well. Both can produce excellent results; they just get there by very different routes, and the writer's ability to revise is the thing researchers are most curious about.

Why it matters: the dominance of left-to-right generation is so total that it's easy to forget it's a choice, not a law of nature. Every serious, openly-released alternative is a chance to learn whether the mainstream approach is truly best or merely entrenched. If diffusion language models can match conventional ones while adding genuine self-correction and parallel generation, that reshapes assumptions about how text AI should be built. Even if they can't quite match them yet, knowing where and why they fall short is valuable knowledge that only open models make possible.

The honest caveat is that the headline promise — real, useful self-correction — still has to prove itself at this scale. It's one thing for the math to allow revision; it's another for a model this size to actually revise in ways that improve its answers rather than just churn. The hard, open question Sumi lets the community finally probe is whether diffusion's theoretical advantages show up in practice when the model is big enough to matter. That we can now ask the question with a real model in hand, openly, is the achievement.


Primary source, verified: read the paper → (arXiv 2606.19005)