Anthropic shipped Claude Opus 4.8 today. The benchmark numbers went up, as they always do. But that's not why I'm switching my default model, and I want to explain the part that actually changed how I work.

The numbers, quickly

Here's the official comparison:

The highlights:

SWE-Bench Pro: 69.2% — up from 64.3% on 4.7, well ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%).