Google's new open model DiffusionGemma generates text from noise instead of word by word

Google released an experimental model with open weights that generates text through diffusion instead of word by word. On a single GPU, it runs up to four times faster in single-user mode than classic language models. Nvidia handled the optimization.

Most language models generate one token after another, basing each new token on the previous one. DiffusionGemma takes a different approach. It starts with a block of 256 random placeholder tokens and refines them across several passes until readable text emerges. The idea comes from image AI, where diffusion models turn noise into clear images.

The model has 26 billion parameters total but only activates 3.8 billion per step. That's thanks to a mixture-of-experts architecture, where several specialized sub-networks sit side by side and only the right ones fire depending on the input. When quantized to lower precision, the model fits into 18 GB of VRAM on high-end consumer GPUs, according to Google. It builds on the Gemma 4 family and borrows its diffusion process from Google's earlier research on Gemini Diffusion.

DiffusionGemma generates far more tokens per second than the autoregressive Gemma 4 models but scores slightly lower on accuracy. | Image: Google

Google's new open model DiffusionGemma generates text from noise instead of word by word

Other newsrooms on this story

Related reading

Google's DiffusionGemma runs text 4x faster

Other newsrooms on this story

Related reading

Google's DiffusionGemma runs text 4x faster

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion…

Google launches DiffusionGemma open model for faster local AI workflows

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Google's latest AI model creates text like an image generator

Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free - Decrypt