Storia in 2 fonti

AstaBench update: New results, plus adoption from industry | Ai2

AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.

Raccontata da

strangeloopcanon.com

allenai.org

Confronto fonti

2 prospettive sulla stessa storia

AI · summaries

allenai.orgStai leggendo1 mesi fa

AstaBench update: New results, plus adoption from industry | Ai2

originale

strangeloopcanon.com1 mesi fa

Introducing BenchBench

TL;DR: presenting the ultimate benchmark, getting models to create benchmarks for each other, and GPT 5.2 is the current (only) winner

Leggi questa versione → originale

Timeline cronologica

lunedì 25 maggio 2026·strangeloopcanon.com
Introducing BenchBench
TL;DR: presenting the ultimate benchmark, getting models to create benchmarks for each other, and GPT 5.2 is the current (only) winner
martedì 26 maggio 2026·allenai.org
AstaBench update: New results, plus adoption from industry | Ai2
AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit,…