Google launches DiffusionGemma open model for faster local AI workflows

Google has introduced DiffusionGemma, an experimental open model designed to generate text faster by using diffusion instead of the token by token process behind most large language models.

DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.

Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time. pic.twitter.com/S62OSbfWff

— Google DeepMind (@GoogleDeepMind) June 10, 2026

The model is a 26 billion parameter Mixture of Experts system released under an Apache 2.0 license. Google said DiffusionGemma activates only 3.8 billion parameters during inference and can run within 18GB of VRAM when quantized, making it suitable for high end consumer GPUs.

Google launches DiffusionGemma open model for faster local AI workflows

Other newsrooms on this story

Related reading

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion…

Google's latest AI model creates text like an image generator

Google's new open model DiffusionGemma generates text from noise instead of…

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right…