You built an AI agent. In the demo it was magic. In the wild it loops, hallucinates a tool call, "forgets" the format you asked for twice, and occasionally does something mildly alarming with your filesystem.
Here's the uncomfortable truth after shipping a lot of these: a flaky agent is almost never a model problem. It's a specification problem. The model is doing exactly what your prompt, your tools, and your control loop told it to do — which turns out to be far less than you thought you said.
Below are seven rules that consistently move an agent from "cool demo" to "I trust this on real work." None of them require a bigger model. Three of them are paste-able guardrails for Claude Code specifically.
1. Make the success criteria machine-checkable, not vibes
"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:






