Why your AI agent is flaky — and 7 rules that make it reliable

You built an AI agent. In the demo it was magic. In the wild it loops, hallucinates a tool call, "forgets" the format you asked for twice, and occasionally does something mildly alarming with your filesystem.

Here's the uncomfortable truth after shipping a lot of these: a flaky agent is almost never a model problem. It's a specification problem. The model is doing exactly what your prompt, your tools, and your control loop told it to do — which turns out to be far less than you thought you said.

Below are seven rules that consistently move an agent from "cool demo" to "I trust this on real work." None of them require a bigger model. Three of them are paste-able guardrails for Claude Code specifically.

1. Make the success criteria machine-checkable, not vibes

"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:

1. Make the success criteria machine-checkable, not vibes

"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:

Why your AI agent is flaky — and 7 rules that make it reliable

Why your AI agent is flaky — and 7 rules that make it reliable

Related reading

The Reliability Problem That Forced Us to Rethink AI Agents

Inside An AI Agent: Planning, Tool Use, Memory, Constraints, And Verification

When Your AI Agent Goes Silent: The Failure Patterns Most Developers Miss

Your AI Agent Doesn't Have a Model Problem — It Has an Ops Problem [The 20%…

AI Agents Don't Crash. They Drift. Here's the Framework to See It.

How I built mechanical enforcement for AI coding agents — and why prompts…

Related reading

The Reliability Problem That Forced Us to Rethink AI Agents

Inside An AI Agent: Planning, Tool Use, Memory, Constraints, And Verification

When Your AI Agent Goes Silent: The Failure Patterns Most Developers Miss

Your AI Agent Doesn't Have a Model Problem — It Has an Ops Problem [The 20%…

AI Agents Don't Crash. They Drift. Here's the Framework to See It.

How I built mechanical enforcement for AI coding agents — and why prompts…