The AI industry just got its first real horse race in diffusion-based language models, and the startup is beating the tech giant. Inception Labs’ Mercury 2, which launched in February 2026, is outperforming Google DeepMind’s DiffusionGemma on a metric that matters more than raw speed: maintaining sophisticated reasoning while generating text in parallel.
Here’s why that distinction is important. Traditional large language models, the kind powering ChatGPT and Claude, generate text one token at a time, left to right, like a typewriter. Diffusion language models (dLLMs) take a fundamentally different approach, generating multiple tokens simultaneously through a denoising process. In English: instead of writing a sentence word by word, they sketch the whole thing at once and then refine it, more like a painter than a typist.
The numbers behind Mercury 2’s edge
Mercury 2 pushes roughly 1,009 tokens per second when running on NVIDIA’s Blackwell GPUs. That throughput figure alone would be impressive, but Inception Labs paired it with pricing that undercuts established competitors: $0.25 per million input tokens and $0.75 per million output tokens.
The company positions those rates as competitive against Claude 4.5 Haiku and GPT-5.2 Mini, both of which are already considered the budget-friendly speed options in the market.







