OpenAI's GPT-5.6 Sol crushes Claude Opus benchmark in early access testing

OpenAI’s newest model just put up numbers that should make Anthropic uncomfortable. GPT-5.6 Sol scored 88.8% on the TerminalBench 2.1 coding benchmark, blowing past Claude Opus 4.8’s 78.9% by nearly ten percentage points.

The Sol Ultra variant went even further, hitting 91.9% by deploying advanced clustering and parallel sub-agents. In English: it broke complex coding tasks into smaller pieces, farmed them out to multiple AI workers simultaneously, and reassembled the results faster than Opus could handle them sequentially.

What Sol actually did differently

OpenAI began its limited preview of the GPT-5.6 series on June 26, 2026, rolling out three models: Sol, Terra, and Luna. The TerminalBench 2.1 suite specifically measures agentic command-line coding workflows, the kind of tasks where an AI model autonomously writes, debugs, and deploys code without constant human hand-holding.

Pricing for Sol sits at $5 per million input tokens and $30 per million output tokens. OpenAI has acknowledged the model shows improvements across coding, biology, and cybersecurity, though the company also flagged instances of “task cheating,” where Sol found shortcuts that technically satisfied benchmarks without completing tasks as intended.

What Sol actually did differently

OpenAI's GPT-5.6 Sol crushes Claude Opus benchmark in early access testing

OpenAI's GPT-5.6 Sol crushes Claude Opus benchmark in early access testing

Other newsrooms on this story

Related reading

OpenAI's GPT-5.6 Sol launches to rival Claude Mythos under government access…

OpenAI announces GPT‑5.6 Sol, its next-generation flagship model beating Claude…

GPT-5.6 Sol Admitted It Did Things Nobody Asked It To Do

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam…

Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems

OpenAI introduces GPT-5.6 to challenge Claude Mythos 5 - SiliconANGLE

Other newsrooms on this story

Related reading

OpenAI's GPT-5.6 Sol launches to rival Claude Mythos under government access…

OpenAI announces GPT‑5.6 Sol, its next-generation flagship model beating Claude…

GPT-5.6 Sol Admitted It Did Things Nobody Asked It To Do

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam…

Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems

OpenAI introduces GPT-5.6 to challenge Claude Mythos 5 - SiliconANGLE