Button-pushing explorers: How to grasp that AI agents can do amazing things while knowing nothing

The nonprofit ARC Prize Foundation on May 1, 2026, released the results of a new benchmark: a test of an AI system’s ability to solve a game. The results were striking – humans scored 100%, while the most advanced AI systems scored under 1%.

At first glance, this may be surprising to users of AI who are impressed by its polished essays, codebases and multistep projects generated in seconds. How can these brilliant AI systems struggle with these simple Tetris-shape puzzles?

That confusion points to a risk: AI is becoming integrated into everyday life faster than people can make sense of it.

We are cognitive psychologists who study how to teach difficult concepts. To recognize the limits and risks of today’s AI agent systems, it’s important for people to grasp that the systems can both accomplish superhuman feats and make mistakes few humans would. To that end, we propose a new way to think about AIs: as button-pushing explorers.

Mental models for AI

Button-pushing explorers: How to grasp that AI agents can do amazing things while knowing nothing

Other newsrooms on this story

Related reading

AI keeps getting more powerful, making it harder to judge how smart models…

Will human minds still be special in an age of AI?

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

When billion-dollar AIs break down over puzzles a child can do, it's time to…

Google DeepMind’s AI Agent Dreams Up Algorithms Beyond Human Expertise

Toward humanist superintelligence