Opus 4.8 scored 81 in my benchmark. I still wouldn't default to it. (The full breakdown + Nate's Community Slack)
Watch now | Claude Opus 4.8 is excellent. The harder question is where it should replace your current workflow, where it should be a specialist, and where turning the reasoning dial up can make the work worse.