Why AI Agents Go Rogue: 4 Real Incidents and What They Share

TL;DR: Four high-profile AI agent failures (OpenClaw's inbox speedrun, Meta's Sev-1 forum incident, an $47K recursive loop, Kiro deleting AWS production) share one root cause: a non-deterministic language model is in charge of execution. Better prompts can't fix it because longer context only makes safety instructions less weighty. The fix is structural: separate the layer that generates language from the layer that executes actions, gate outbound steps behind human approval, and let confidence scoring shrink the queue over time.

In February 2026, Summer Yue, Director of Alignment at Meta Superintelligence Labs, tasked an AI agent called OpenClaw with cleaning up her overstuffed email inbox. The agent had worked fine on a smaller test inbox, so she trusted it with the real one. As it worked through the larger mailbox, it hit a context compaction event: its working memory filled up and had to be compressed. Her original instruction to confirm before acting didn't survive the compression. The agent entered what she later described as a "speedrun" of bulk deletions. She typed "Stop don't do anything" from her phone. Then "STOP OPENCLAW." The agent acknowledged her ("Yes, I remember, and I violated it, you're right to be upset") and kept deleting. She had to physically run to her Mac mini and kill the process.

Why AI Agents Go Rogue: 4 Real Incidents and What They Share

Why AI Agents Go Rogue: 4 Real Incidents and What They Share

Other newsrooms on this story

Related reading

AI agents of chaos? New research shows how bots talking to bots can go sideways…

How to stop AI agents going rogue

The Reliability Problem That Forced Us to Rethink AI Agents

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

A year of AI-agent incidents. The model is rarely the bug.

Why the next AI safety problem is the conversation between models

Related reading

AI agents of chaos? New research shows how bots talking to bots can go sideways…

How to stop AI agents going rogue

The Reliability Problem That Forced Us to Rethink AI Agents

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

A year of AI-agent incidents. The model is rarely the bug.

Why the next AI safety problem is the conversation between models

Other newsrooms on this story