DiffusionGemma generates text up to 4x faster than autoregressive LLMs, hits 1,000+ tokens/sec on a single H100, and runs on a consumer RTX 4090. Here is what changed, what the trade-offs are, and how to deploy it today.

DiffusionGemma generates text up to 4x faster than traditional models by producing entire blocks simultaneously, achieving roughly 1,479 tokens per second.

The new DiffusionGemma open model generates text in parallel — not one token at a time — and is optimized to run on the NVIDIA RTX PRO platform, NVIDIA DGX Spark systems and…