Anthropic shipped Claude Opus 4.8 today. The benchmark numbers went up, as they always do. But that's not why I'm switching my default model, and I want to explain the part that actually changed how I work.
The numbers, quickly
Here's the official comparison:
The highlights:
SWE-Bench Pro: 69.2% — up from 64.3% on 4.7, well ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%).














