Storia in 2 fonti

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks.

Raccontata da

cryptobriefing.com

the-decoder.com

Confronto fonti

2 prospettive sulla stessa storia

AI · summaries

the-decoder.comStai leggendo6 g fa

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

originale

cryptobriefing.com6 g fa

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source tasks

MirrorCode benchmark from METR and Epoch AI tests AI agents on reimplementing entire programs. Claude Opus 4.6 rebuilt a 16,000-line toolkit passing 99.95%

Leggi questa versione →

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source tasks

Timeline cronologica

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source tasks

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run