Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers. …

DiffusionGemma generates text up to 4x faster than traditional models by producing entire blocks simultaneously, achieving roughly 1,479 tokens per second.

Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This limits responsiveness,…

Google’s experimental DiffusionGemma model uses text diffusion to generate blocks of text in parallel, targeting faster local AI inference for developers.

Google AI releases DiffusionGemma, a 26B MoE open text diffusion model generating 256-token blocks in parallel, up to 4x faster.

Google released DiffusionGemma, a 26-billion-parameter model that generates text not token by token but through diffusion, similar to how image AI turns noise into a picture.…

Diffusion AI is most common in image generation, but it can make text outputs much faster.

DiffusionGemma hits 1,000 tokens per second by ditching word-by-word generation entirely. It just doesn't run on most people's machines yet.

Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in…

Google open-sources speedy DiffusionGemma text diffusion model - SiliconANGLE

Google DeepMind has introduced DiffusionGemma, a groundbreaking AI model that processes text in parallel, delivering up to 4x faster performance on local hardware like gaming GPUs.

Google DeepMind ha annunciato DiffusionGemma, un modello open source sperimentale basato sulla generazione testuale tramite diffusione. Grazie alla produzione parallela di blocchi…

What: Google released DiffusionGemma, an open-weight model whose headline trick is parallel...

DiffusionGemma genera blocchi di testo in parallelo invece di un token alla volta. È open source, gira anche su GPU consumer e può superare i 1.000 token al secondo

Google's open-source diffusion language model generates 256 tokens in parallel and self-corrects, hitting 4x speed on one GPU at a cost to quality.

DiffusionGemma generates text up to 4x faster than autoregressive LLMs, hits 1,000+ tokens/sec on a single H100, and runs on a consumer RTX 4090. Here is what changed, what the…