Anthropic recently dropped the updated Claude 4.6 lineup, and as usual, the two names everyone cares about are Opus 4.6 and Sonnet 4.6.

Opus is the expensive “best possible” model, and Sonnet is the cheaper, more general one that a lot of people actually use day to day. So I wanted to see what the real gap looks like when you ask both to build something serious, not a toy demo.

Benchmark-wise, there’s a difference of course, but it doesn’t look that huge when it comes to SWE and agentic coding.

I kept it super basic: one test (but a big one), same prompt, same workflow. I just compared how close they got without me stepping in.

TL;DR