Storia in 2 fonti

AI scores a ‘C–’ on its hardest math test yet

The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got six or seven of the 10 questions right

Raccontata da

scientificamerican.com

nature.com

Confronto fonti

2 prospettive sulla stessa storia

AI · summaries

scientificamerican.comStai leggendo1 mese fa

AI scores a ‘C–’ on its hardest math test yet

The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got six or seven of the 10 questions right

originale

nature.com1 mese fa

Humans outperform AI at this highly rigorous mathematics test

First Proof test on ten unpublished research math problems: the top model (ETH's ChatGPT harness with advisory council) scored 6/10, below expert mathematicians. This reveals critical limits in autonomous mathematical reasoning, constraining AI as a research assistant or proof-checker in enterprise R&D.

Leggi questa versione →

Timeline cronologica

mercoledì 10 giugno 2026·scientificamerican.com
AI scores a ‘C–’ on its hardest math test yet
The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got six or seven of the 10 questions right
venerdì 12 giugno 2026·nature.com
Humans outperform AI at this highly rigorous mathematics test
A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.