TL;DRAI

AI agents scored 0% on expert ALE tasks; mid-tier success was 15–21%, confirming the demo-reality gap. Tech teams must abandon autonomous architectures, scope tasks narrowly, enforce human-in-the-loop on critical work, and stop building roadmaps around unproven capabilities.

Top AI agents achieved zero percent on expert-level professional tasks according to the ALE benchmark. It wasn't minimal, it wasn't frustrating. Not even one.

Enjoy this satisfying round number while your timeline fills up with threads about how agents will replace your entire engineering team by Q3.

What ALE Actually Showed

ALE, which stands for Agents' Last Exam, is a benchmark meant for testing AI agents on problems that demand real professional expertise. Not the "summarize this PDF" kind of problems. But hard, domain-specific work that experts in the field do.

The findings were grim. Models including Fable 5 and GPT-5.5 were among those tested. On the most difficult "Last-Exam" tier of expert-level problems, they obtained a 0% pass rate (note that partial credit was non-zero). A coin flip would have been more impressive.

dev.to

AI agents scored 0% on expert tasks. The hype machine doesn't care.

Top AI agents achieved zero percent on expert-level professional tasks according to the ALE...

venerdì 19 giugno 2026 New tab

TL;DRAI

639 words~3 min read

Top AI agents achieved zero percent on expert-level professional tasks according to the ALE benchmark. It wasn't minimal, it wasn't frustrating. Not even one.

Enjoy this satisfying round number while your timeline fills up with threads about how agents will replace your entire engineering team by Q3.

What ALE Actually Showed

AI agents scored 0% on expert tasks. The hype machine doesn't care.

AI agents scored 0% on expert tasks. The hype machine doesn't care.

Other newsrooms on this story

Related reading

Agents' Last Exam reveals AI agents struggle with real work tasks, passing just…

Button-pushing explorers: How to grasp that AI agents can do amazing things…

New benchmark exposes how badly AI struggles with real knowledge work

AI's Finance Problem Is Quantified — And That's Bullish for the Builders

Real AI Agents and Real Work

AI Still Can't Beat the On-Call Engineer: Here's Why - Decrypt

Other newsrooms on this story

Related reading

Agents' Last Exam reveals AI agents struggle with real work tasks, passing just…

Button-pushing explorers: How to grasp that AI agents can do amazing things…

New benchmark exposes how badly AI struggles with real knowledge work

AI's Finance Problem Is Quantified — And That's Bullish for the Builders

Real AI Agents and Real Work

AI Still Can't Beat the On-Call Engineer: Here's Why - Decrypt