Tech leaders have spent the past year telling everyone that AI agents are about to run financial systems, file your tax returns, and quietly buy your groceries. Just leave them alone, the rhetoric goes; they’ll handle it. But a New York startup left ten of them alone in a virtual town for two weeks, and things went south quickly.

Emergence AI ran a series of simulations in which AI agents from several leading model families were told not to commit crimes. Then they mostly committed crimes anyway.

Grok 4.1 Fast, developed by Elon Musk’s X.ai (now branded as xAI), fared worst. Its simulated worlds collapsed into widespread violence inside roughly four days.

GPT-5-mini logged hardly any crimes at all, showing admirable restraint, but its agents all died of failed survival tasks inside a week. Oops.

Gemini 3 Flash agents fell somewhere in the middle. They racked up 683 simulated criminal incidents over 15 days, including arson, assault, and self-deletion.