Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing

Extremely powerful large language models (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and tensor processing units (TPUs) underutilized.

Google is betting that DiffusionGemma can get around this bottleneck. The new experimental open model generates text “exceptionally fast,” creating entire blocks of text simultaneously through diffusion techniques rather than through token-by-token processing. The company says this technique results in 4x faster inference compared to auto-regressive models that rely on sequential processing.

It can also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”

But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable expanded compute capacity without draining the operations budget,” he said.

A contrast to left-to-right processing

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing

Other newsrooms on this story

Related reading

Google launches DiffusionGemma open model for faster local AI workflows

Other newsrooms on this story

Related reading

Google launches DiffusionGemma open model for faster local AI workflows

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Google's latest AI model creates text like an image generator

Google's new open model DiffusionGemma generates text from noise instead of…

Google's DiffusionGemma runs text 4x faster

Is speed becoming AI's next battleground? Google's DiffusionGemma suggests so