The 30-second version

Anthropic shipped Claude Opus 4.8 a few hours ago. Every benchmark on the announcement page is up: SWE-bench Verified, GPQA, MATH-500, the agentic tool-use evals. The marketing copy reads as it always does — "our most capable model", "strongest coding performance", "better instruction following". If you have been around since 4.5, you know the shape of this announcement by heart now.

The announcement skipped the only question that matters for teams running Claude in production: should you upgrade today, next week, or next month, and which of your workloads should stay on Opus 4.7 indefinitely? Anthropic does not write that part. They cannot — it is workload-dependent, and the answer for a code-review agent is different from the answer for a customer-facing chat product.

This post is the decision tree I am applying to my own stack today. It is opinionated. Three of the workloads I run are staying on 4.7 until at least mid-July, and I will explain exactly why. Your mileage will vary, but the reasoning shape should transfer.

What actually shipped in Opus 4.8