Storia in 1 fonti

New math benchmark reveals AI models confidently solve problems that have no solution

A consortium of 64 mathematicians built SOOHAK, a new AI benchmark with 439 handwritten tasks, including 99 that are deliberately unsolvable. Google's Gemini 3 Pro leads on research-level problems at 30 percent. But no model cracks 50 percent on spotting broken tasks. More compute makes models better at solving. It doesn't improve them at admitting a problem has no answer. SOOHAK tries to pin down the gap between a few flashy results and the broad research skills AI systems still lack.

Raccontata da

the-decoder.com

Timeline cronologica

domenica 17 maggio 2026·the-decoder.com
New math benchmark reveals AI models confidently solve problems that have no solution
A consortium of 64 mathematicians built SOOHAK, a new AI benchmark with 439 handwritten tasks, including 99 that are deliberately unsolvable. Google's Gemini 3 Pro leads on…