In brief

Inception Labs' Mercury 2 generates roughly 1,000 tokens per second and scored 90 on the AIME 2026

Google's recent DiffusionGemma hits similar speeds but performs worse on benchmarks.

DiffusionGemma is free and open-weight on Hugging Face. Mercury 2 is a paid, closed-weight API model.

Inception Labs introduced Mercury 2 on Thursday, calling it the world's fastest reasoning language model. Per the company's announcement, it generates about 1,000 tokens per second—the chunks of text an AI model reads and writes—against roughly 89 tokens per second for Anthropic’s Claude Haiku 4.5 Reasoning and 71 for OpenAI’s GPT-5 Mini.That puts it in the same speed bracket Google would later claim for DiffusionGemma.