DiffusionGemma offers 4x faster output with simultaneous text generation

For years, large language models have worked like a very fast typist: one word at a time, left to right, no looking back. DiffusionGemma throws that playbook out entirely. The open model uses diffusion techniques to produce full blocks of text simultaneously, achieving generation speeds up to four times faster than traditional autoregressive models.

How DiffusionGemma actually works

Traditional language models generate text sequentially. Each token (roughly a word or word fragment) is produced one after another, with each new token depending on everything that came before it.

DiffusionGemma borrows from the same family of techniques that revolutionized image generation. Diffusion models work by starting with noise and iteratively refining it into coherent output. Applied to text, this means the model can work on multiple parts of a response at the same time rather than waiting for each word to be finalized before moving to the next.

In evaluations, DiffusionGemma has achieved sampling speeds of approximately 1,479 tokens per second. That 4x speed improvement isn’t a theoretical ceiling. It’s a measured benchmark.

DiffusionGemma offers 4x faster output with simultaneous text generation

Other newsrooms on this story

Related reading

DiffusionGemma: 4x faster text generation

Google's DiffusionGemma runs text 4x faster

Google's new open model DiffusionGemma generates text from noise instead of…

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion…

Google launches DiffusionGemma open model for faster local AI workflows

Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text…