Every experiment in this post ran in staging. Production was double-locked: a ## Chaos Rules block in CLAUDE.md forbidding production targets, and a PreToolUse hook that exits 2 if --env=production shows up in any chaos command. I'll show both at the end. The point of saying this upfront is that "I let Claude design chaos experiments" is the kind of sentence people read sideways. The TL;DR is: staging only, twice-locked, and the whole exercise was supervised end to end.
With that out of the way: Steadybit released what is widely described as the first chaos engineering MCP server in mid-2025. I plugged Claude Code into it and asked, in a single sentence, to design experiments that test payment-service's resilience under connection-pool stress. Claude proposed four of them. Three came back without an SLO breach. The fourth took staging down completely. When I traced the failure, it wasn't a contrived test bug. It was a real production pattern that had been flickering in our logs for 6 months and that we had never been able to reproduce: pool exhaustion → retry storm → rate limiter self-DoS. Here is the run, the bug, and the three guardrails I now require before letting any AI design chaos experiments.






