TL;DRAI

Mercury 2 (Inception Labs, Feb 2026) reaches 1,009 tokens/sec at $0.25/$0.75 per million tokens, outperforming Google's DiffusionGemma while preserving complex reasoning in parallel generation. Diffusion inference redefines GPU workload economics and disrupts current budget-tier pricing.

The AI industry just got its first real horse race in diffusion-based language models, and the startup is beating the tech giant. Inception Labs’ Mercury 2, which launched in February 2026, is outperforming Google DeepMind’s DiffusionGemma on a metric that matters more than raw speed: maintaining sophisticated reasoning while generating text in parallel.

Here’s why that distinction is important. Traditional large language models, the kind powering ChatGPT and Claude, generate text one token at a time, left to right, like a typewriter. Diffusion language models (dLLMs) take a fundamentally different approach, generating multiple tokens simultaneously through a denoising process. In English: instead of writing a sentence word by word, they sketch the whole thing at once and then refine it, more like a painter than a typist.

The numbers behind Mercury 2’s edge

Mercury 2 pushes roughly 1,009 tokens per second when running on NVIDIA’s Blackwell GPUs. That throughput figure alone would be impressive, but Inception Labs paired it with pricing that undercuts established competitors: $0.25 per million input tokens and $0.75 per million output tokens.

The company positions those rates as competitive against Claude 4.5 Haiku and GPT-5.2 Mini, both of which are already considered the budget-friendly speed options in the market.

cryptobriefing.com

Inception Labs' Mercury 2 outperforms Google's DiffusionGemma in the race to replace autoregressive AI

Inception Labs' Mercury 2 diffusion language model outperforms Google's DiffusionGemma, generating 1,009 tokens per second while retaining reasoning

domenica 21 giugno 2026 New tab

TL;DRAI

510 words~2 min read

The numbers behind Mercury 2’s edge

The company positions those rates as competitive against Claude 4.5 Haiku and GPT-5.2 Mini, both of which are already considered the budget-friendly speed options in the market.

Inception Labs' Mercury 2 outperforms Google's DiffusionGemma in the race to replace autoregressive AI

Inception Labs' Mercury 2 outperforms Google's DiffusionGemma in the race to replace autoregressive AI

Other newsrooms on this story

Related reading

Inception Labs' Mercury 2 AI outperforms Google's DiffusionGemma: DecryptMedia

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game -…

DiffusionGemma: How Google's New Open LLM Hits 1,000 Tokens/sec and Changes…

Is speed becoming AI's next battleground? Google's DiffusionGemma suggests so

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right…

Google launches DiffusionGemma open model for faster local AI workflows

Other newsrooms on this story

Related reading

Inception Labs' Mercury 2 AI outperforms Google's DiffusionGemma: DecryptMedia

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game -…

DiffusionGemma: How Google's New Open LLM Hits 1,000 Tokens/sec and Changes…

Is speed becoming AI's next battleground? Google's DiffusionGemma suggests so

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right…

Google launches DiffusionGemma open model for faster local AI workflows