WARPTECHNEWS · LAB
HomeAIBusinessTechArchive
WARPTECH LAB NEWS

Warptech Lab News aggrega le notizie più rilevanti da oltre 700 fonti internazionali, con classificazione AI, TL;DR sintetici e timeline cluster su singole storie.

Navigazione

  • Home
  • Archivio
  • Editor's Brief
  • Cerca
  • Il tuo account
  • Newsletter tech/AI

Informazioni legali

  • Privacy Policy
  • Termini di servizio
  • Cookie Policy

© 2026 Sparktech S.R.L. — Tutti i diritti riservati. Sito gestito e manutenuto da Sparktech S.R.L.

Sede legale: Corso Libertà 55, 13100 Vercelli (VC), Italia · P.IVA / C.F. 02835910023 · Contatti: admin@warptechlab.com

Home
Storia in 2 fonti

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks.

Raccontata dacryptobriefing.comthe-decoder.com

Confronto fonti

2 prospettive sulla stessa storia
AI · summaries
the-decoder.comStai leggendo6 g fa

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in…

originale
cryptobriefing.com6 g fa

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source tasks

MirrorCode benchmark from METR and Epoch AI tests AI agents on reimplementing entire programs. Claude Opus 4.6 rebuilt a 16,000-line toolkit passing 99.95%

Leggi questa versione →

Timeline cronologica

  1. venerdì 26 giugno 2026·cryptobriefing.com

    MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source tasks

    MirrorCode benchmark from METR and Epoch AI tests AI agents on reimplementing entire programs. Claude Opus 4.6 rebuilt a 16,000-line toolkit passing 99.95%

  2. venerdì 26 giugno 2026·the-decoder.com

    An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

    Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate,…

originale