FrontierMath benchmark undergoes major audit as Epoch AI flags errors in one-third of math problems
Epoch AI's FrontierMath benchmark audit flagged errors in roughly one-third of its 350 math problems, raising questions about AI capability measurements.