NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce RTX GPUs, the NVIDIA RTX PRO platform and NVIDIA DGX Spark systems, from local PCs to the cloud.

Rather than generating text one word at a time, DiffusionGemma generates multiple words in parallel to output whole blocks of text, opening a new, low-latency frontier for the kind of single-user workloads that developers, researchers and AI enthusiasts run every day.

Features of the new model include:

Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time.

Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates just 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture.

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Other newsrooms on this story

Related reading

Google launches DiffusionGemma open model for faster local AI workflows

Other newsrooms on this story

Related reading

Google launches DiffusionGemma open model for faster local AI workflows

Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text…

Google's latest AI model creates text like an image generator

Google's new open model DiffusionGemma generates text from noise instead of…

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion…

Google's latest DiffusionGemma open AI model comes with a 4x speed boost