Anthropic releases Claude Opus 4.8, which beats GPT-5.5 and Gemini 3.1 Pro in most benchmarks. The model also catches its own coding errors four times more often than its predecessor. Alongside the launch, Anthropic is rolling out dynamic workflows that can spin up hundreds of parallel sub-agents to handle tasks like codebase-wide migrations.

DeepSWE puts GPT-5.5 atop the AI coding leaderboard while raising new questions about Claude Opus, SWE-Bench Pro, and benchmark leakage.

Anthropic releases Claude Opus 4.8 with sharper judgement and fewer uncaught code flaws. Mythos-class models coming in weeks. Series H raises $65B at $965B valuation.

Anthropic says Opus 4.8 is “more likely to flag uncertainties.”

A Mythos-class model is expected in the coming weeks.