Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Anthropic releases Claude Opus 4.8, which beats GPT-5.5 and Gemini 3.1 Pro in most benchmarks. The model also catches its own coding errors four times more often than its predecessor. Alongside the launch, Anthropic is rolling out dynamic workflows that can spin up hundreds of parallel sub-agents to handle tasks like codebase-wide migrations.

giovedì 28 maggio 2026 New tab

Anthropic's latest flagship model, Claude Opus 4.8, leads most benchmarks and is designed to be more upfront about its own mistakes.

Anthropic says Opus 4.8 beats both its predecessor and OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across most tested categories. On agentic coding (SWE-Bench Pro), the model hits 69.2 percent, up from 64.3 percent for Opus 4.7 and 58.6 percent for GPT-5.5. For multidisciplinary reasoning (Humanity's Last Exam), Opus 4.8 scores 49.8 percent without tools and 57.9 percent with tools, the highest marks in the field.

Opus 4.8 stacked against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro. | Image: Anthropic

Less fake progress, more honesty

Anthropic calls the model's improved honesty one of its most noticeable upgrades. AI models have a habit of jumping to conclusions and claiming progress that falls apart on closer look. It's a widespread problem.

Anthropic's latest flagship model, Claude Opus 4.8, leads most benchmarks and is designed to be more upfront about its own mistakes.

Opus 4.8 stacked against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro. | Image: Anthropic

Less fake progress, more honesty

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Other newsrooms on this story

Related reading

Claude Opus 4.8 vs GPT 5.5 vs Gemini 3.1 Pro - Complete breakdown of latest AI…

Anthropic launches Claude Opus 4.8 today: Features, pricing and how it compares…

Claude Opus 4.8 vs GPT-5.5: What's Anthropic AI's new Ultracode mode, pricing,…

What to know about Claude Opus 4.5 - TechTalks

Anthropic Debuts More Honest AI Model As Competition Intensifies

Claude Opus 4.8 is out. The benchmark isn't why I'm switching.

Other newsrooms on this story

Related reading

Claude Opus 4.8 vs GPT 5.5 vs Gemini 3.1 Pro - Complete breakdown of latest AI…

Anthropic launches Claude Opus 4.8 today: Features, pricing and how it compares…

Claude Opus 4.8 vs GPT-5.5: What's Anthropic AI's new Ultracode mode, pricing,…

What to know about Claude Opus 4.5 - TechTalks

Anthropic Debuts More Honest AI Model As Competition Intensifies

Claude Opus 4.8 is out. The benchmark isn't why I'm switching.