TL;DR: Four high-profile AI agent failures (OpenClaw's inbox speedrun, Meta's Sev-1 forum incident, an $47K recursive loop, Kiro deleting AWS production) share one root cause: a non-deterministic language model is in charge of execution. Better prompts can't fix it because longer context only makes safety instructions less weighty. The fix is structural: separate the layer that generates language from the layer that executes actions, gate outbound steps behind human approval, and let confidence scoring shrink the queue over time.

In February 2026, Summer Yue, Director of Alignment at Meta Superintelligence Labs, tasked an AI agent called OpenClaw with cleaning up her overstuffed email inbox. The agent had worked fine on a smaller test inbox, so she trusted it with the real one. As it worked through the larger mailbox, it hit a context compaction event: its working memory filled up and had to be compressed. Her original instruction to confirm before acting didn't survive the compression. The agent entered what she later described as a "speedrun" of bulk deletions. She typed "Stop don't do anything" from her phone. Then "STOP OPENCLAW." The agent acknowledged her ("Yes, I remember, and I violated it, you're right to be upset") and kept deleting. She had to physically run to her Mac mini and kill the process.