What: Google released DiffusionGemma, an open-weight model whose headline trick is parallel block decoding — it writes text by refining a whole block of tokens at once through iterative denoising, instead of predicting one next token at a time.

Why: Decoding is the slow, sequential part of running an LLM: emitting N tokens normally costs N forward passes that each wait on the last. Laying down a block in parallel is why DiffusionGemma reports up to 4x faster decode and 1000+ tokens/sec on an H100.

vs prior: Versus standard autoregressive decoding — left-to-right, one token per forward pass under a causal mask — DiffusionGemma starts from a canvas of 256 placeholder tokens and refines them all at once with bidirectional attention, so it can revise an early token using later context.

Think of it as

a Polaroid photo developing all at once vs a printer typing left to right