Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

Google AI team including the Google DeepMind researchers have just released DiffusionGemma, an experimental open model for text generation. It uses text diffusion instead of standard autoregressive decoding. The model ships under a permissive Apache 2.0 license. Google positions it for devs and researchers exploring speed-critical, interactive local workflows. Examples include in-line editing, rapid iteration, and generating non-linear text structures.

Most language models in use today are autoregressive. They generate one token at a time, left to right. Each new token depends on the token before it. DiffusionGemma works differently. It generates entire blocks of text simultaneously, in parallel. On dedicated GPUs, this delivers up to 4x faster generation.

What is DiffusionGemma

DiffusionGemma is a 26B Mixture of Experts (MoE) model. It activates only 3.8B parameters during inference. It is built on the Gemma 4 backbone, specifically the 26B-A4B architecture. Google integrated a diffusion head onto that base.

The model is multimodal. It processes interleaved text, image, and video inputs. It generates text outputs from those inputs. The context window is 256K tokens, and it supports 140+ languages.

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

Other newsrooms on this story

Related reading

Google launches DiffusionGemma open model for faster local AI workflows

Google's new open model DiffusionGemma generates text from noise instead of…

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Google's DiffusionGemma runs text 4x faster

Google's latest AI model creates text like an image generator

Google open-sources speedy DiffusionGemma text diffusion model - SiliconANGLE