TL;DRAI

Mercury 2 generates 1,000 tokens/sec and scores 90% on AIME 2026 vs. DiffusionGemma's 69.1% via parallel diffusion. Augment Code cut sub-agent latency 82% and cost 90% with Mercury 2—proving parallel generation scales agentic production systems.

In brief

Inception Labs' Mercury 2 generates roughly 1,000 tokens per second and scored 90 on the AIME 2026

Google's recent DiffusionGemma hits similar speeds but performs worse on benchmarks.

DiffusionGemma is free and open-weight on Hugging Face. Mercury 2 is a paid, closed-weight API model.

Inception Labs introduced Mercury 2 on Thursday, calling it the world's fastest reasoning language model. Per the company's announcement, it generates about 1,000 tokens per second—the chunks of text an AI model reads and writes—against roughly 89 tokens per second for Anthropic’s Claude Haiku 4.5 Reasoning and 71 for OpenAI’s GPT-5 Mini.That puts it in the same speed bracket Google would later claim for DiffusionGemma.

decrypt.co

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game - Decrypt

Both models trade word-by-word generation for parallel denoising. Only one of them does it without losing intelligence in the trade.

domenica 21 giugno 2026 New tab

TL;DRAI

721 words~3 min read

In brief

Inception Labs' Mercury 2 generates roughly 1,000 tokens per second and scored 90 on the AIME 2026

Google's recent DiffusionGemma hits similar speeds but performs worse on benchmarks.

DiffusionGemma is free and open-weight on Hugging Face. Mercury 2 is a paid, closed-weight API model.

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game - Decrypt

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game - Decrypt

Other newsrooms on this story

Related reading

Inception Labs' Mercury 2 outperforms Google's DiffusionGemma in the race to…

Inception Labs' Mercury 2 AI outperforms Google's DiffusionGemma: DecryptMedia

Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free - Decrypt

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right…

Google's new open model DiffusionGemma generates text from noise instead of…

Is speed becoming AI's next battleground? Google's DiffusionGemma suggests so

Other newsrooms on this story

Related reading

Inception Labs' Mercury 2 outperforms Google's DiffusionGemma in the race to…

Inception Labs' Mercury 2 AI outperforms Google's DiffusionGemma: DecryptMedia

Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free - Decrypt

Google unveils DiffusionGemma, an AI model that breaks free of left-to-right…

Google's new open model DiffusionGemma generates text from noise instead of…

Is speed becoming AI's next battleground? Google's DiffusionGemma suggests so