OpenAI finally released GPT-5.2 after declaring “code red” following Google’s release of Gemini 3 Pro. But did OpenAI reclaim the throne in the AI race? The answer is complicated.
GPT-5.2 sets a new record on the prestigious ARC-AGI-2 benchmark, which evaluates the ability of models to solve visual puzzles that require abstract reasoning. It also leads on several practical benchmarks, including GDPval, SWE-Bench Verified, and GPQA.
Meanwhile, at the time of this writing, GPT-5.2 is still lagging behind Gemini 3 Pro on the overall ranking in the independent Artificial Analysis Index and the Epoch Capabilities Index (ECI). On the Simple Bench leaderboard, which tracks the capabilities of LLMs in simple reasoning questions, GPT-5.2 Pro, the most capable version of the model, stands at a disappointing 8th place.
And if you look on X, you’ll find all kinds of opinions and anecdotes of GPT-5.2 either being one step away from artificial general intelligence (AGI) or too dumb to count the number of r’s in the word “garlic.”
“How many R’s in garlic”I asked this same question to four AI models. Here’s how they performed:GPT 5.2: incorrectGemini 3: correctDeepSeek R1: correctQwen3-Max: correct pic.twitter.com/X6JnJuDuRz— Kyle Chan (@kyleichan) December 12, 2025








