The Sequence Knowledge #862: Learning About Text Diffusion Models

If you look at the architecture of the modern AI boom, it is heavily bifurcated by modality. In the visual domain, we are entirely ruled by diffusion models. From Midjourney to Stable Diffusion to OpenAI’s Sora, the paradigm of starting with pure noise and iteratively denoising it into a high-fidelity image or video has proven to be unreasonably effective.But in the realm of text, diffusion has historically been an afterthought. Large Language Models (LLMs) like GPT-4, Claude, and LLaMA are staunchly autoregressive (AR). They are sequence predictors. They look at the context, predict the next token, append it to the context, and repeat. It is a strictly left-to-right, causal process.For years, the consensus was simple: autoregression is just the native physics of language. But this sequential paradigm has glaring pathologies. Because AR models generate blindly from left to right, they cannot easily engage in global planning. If they make a slight logical error early in a sequence, that error is committed to the context window permanently, leading to cascading failures—a phenomenon often critiqued as “generation drift.” Furthermore, AR models suffer from the “reversal curse”; they can easily recite a poem forward, but if you ask them to recite it backward, their causal attention mechanisms break down entirely.

The Sequence Knowledge #862: Learning About Text Diffusion Models

The Sequence Knowledge #862: Learning About Text Diffusion Models

Other newsrooms on this story

Related reading

The Sequence Knowledge #870: Liquid Models and the Search for a…

The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About

The Sequence Knowledge #846: Beyond Transformer: A New Series

The Sequence Knowledge #874: Transformers or Not?

The Sequence Radar #845: Last Week in AI: Anthropic and OpenAI Enter a New Phase

How the Next Generation of AI Models are Going to Completely Change AI Inference

Related reading

The Sequence Knowledge #870: Liquid Models and the Search for a…

The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About

The Sequence Knowledge #846: Beyond Transformer: A New Series

The Sequence Knowledge #874: Transformers or Not?

The Sequence Radar #845: Last Week in AI: Anthropic and OpenAI Enter a New Phase

How the Next Generation of AI Models are Going to Completely Change AI Inference

Other newsrooms on this story