Why AI benchmarks are broken - TechTalks

OpenAI finally released GPT-5.2 after declaring “code red” following Google’s release of Gemini 3 Pro. But did OpenAI reclaim the throne in the AI race? The answer is complicated.

GPT-5.2 sets a new record on the prestigious ARC-AGI-2 benchmark, which evaluates the ability of models to solve visual puzzles that require abstract reasoning. It also leads on several practical benchmarks, including GDPval, SWE-Bench Verified, and GPQA.

Meanwhile, at the time of this writing, GPT-5.2 is still lagging behind Gemini 3 Pro on the overall ranking in the independent Artificial Analysis Index and the Epoch Capabilities Index (ECI). On the Simple Bench leaderboard, which tracks the capabilities of LLMs in simple reasoning questions, GPT-5.2 Pro, the most capable version of the model, stands at a disappointing 8th place.

And if you look on X, you’ll find all kinds of opinions and anecdotes of GPT-5.2 either being one step away from artificial general intelligence (AGI) or too dumb to count the number of r’s in the word “garlic.”

“How many R’s in garlic”I asked this same question to four AI models. Here’s how they performed:GPT 5.2: incorrectGemini 3: correctDeepSeek R1: correctQwen3-Max: correct pic.twitter.com/X6JnJuDuRz— Kyle Chan (@kyleichan) December 12, 2025

OpenAI finally released GPT-5.2 after declaring “code red” following Google’s release of Gemini 3 Pro. But did OpenAI reclaim the throne in the AI race? The answer is complicated.

Why AI benchmarks are broken - TechTalks

Why AI benchmarks are broken - TechTalks

Other newsrooms on this story

Related reading

OpenAI aims to show its not falling behind its rivals with GPT-5.2 release |…

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude…

There Is No Best AI Model in 2026 — And That's Actually Good News

OpenAI releases GPT-5.6 and ChatGPT Work tool

OpenAI GPT-5.6: AI Could Do Anything, Then It Met ARC-AGI-3

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam…

Other newsrooms on this story

Related reading

OpenAI aims to show its not falling behind its rivals with GPT-5.2 release |…

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude…

There Is No Best AI Model in 2026 — And That's Actually Good News

OpenAI releases GPT-5.6 and ChatGPT Work tool

OpenAI GPT-5.6: AI Could Do Anything, Then It Met ARC-AGI-3

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam…