Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Claude Opus 4.7 shipped last week, and the question any engineering team reaches for is how it...

lunedì 22 giugno 2026 New tab

2,139 words~10 min read

Claude Opus 4.7 shipped last week, and the question any engineering team reaches for is how it compares to its peers.

It is the strongest frontier coding model we tested on the baseline leaderboard, and it will be the easy default a lot of teams reach for.

But in 2026, the model you reach for could matter less than the skill you load with it.

That is what 880 evals across nine models (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5, gpt-5.4, gpt-5.3-codex, gpt-5-codex, and Cursor's Composer-2) tell us.

Let’s take a step back. It’s now 2026, and agent skills are spreading like wildfire… (even our favourite movies are catching up to them).

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Other newsrooms on this story

Related reading

What to know about Claude Opus 4.5 - TechTalks

Is it agentic enough? Benchmarking open models on your own tooling

What Claude Opus 4.8 Actually Changes If You’re Building Agents | Towards AI

Beyond the Hype: Claude Opus 4.8, Parallel Subagents, and the Reality of…

Anthropic's Claude Opus 4.8 is here with 3X cheaper fast mode and near-Mythos…

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that…

Other newsrooms on this story

Related reading

What to know about Claude Opus 4.5 - TechTalks

Is it agentic enough? Benchmarking open models on your own tooling

What Claude Opus 4.8 Actually Changes If You’re Building Agents | Towards AI

Beyond the Hype: Claude Opus 4.8, Parallel Subagents, and the Reality of…

Anthropic's Claude Opus 4.8 is here with 3X cheaper fast mode and near-Mythos…

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that…