Diffusion text models — which draft an entire block of text at once and then iteratively refine it, rather than generating one token at a time left to right — have moved from research curiosity to a real two-horse race this week. Google released DiffusionGemma as an open-weight model, and Inception Labs launched Mercury 2 as a hosted service, both betting that parallel generation is the future of fast text.

Key facts

What: Diffusion text models generate in parallel blocks rather than left to right; Google's open DiffusionGemma and Inception's Mercury 2 are now in a head-to-head over speed.

When: 2026-06-22

Primary source: read the source