Three checks that separate an agent demo from a production agent

Shipping an agent demo takes an afternoon. Shipping one that survives a quarter in production is a different job — and the gap is almost never the model. It's three boring things that are usually missing entirely.

I maintain an open, MIT-licensed Agentic Product Standard, and v2.0 was mostly about turning those three things from advice into code you can run. Here they are, with the actual code.

Security is structural, not a filter

The most common mistake is treating safety as a guardrail — an input/output filter near the edge. The problem is that filters have a ceiling. The best content classifiers run around 97% accuracy, which means ~3% of prompt-injection attempts land by design. That's not a bug you tune away; it's the nature of filtering.

Real safety comes from architecture. The check I reach for first is Simon Willison's lethal trifecta: an agent becomes an exfiltration tool the moment it has all three of —

Three checks that separate an agent demo from a production agent

Related reading

Shipping AI Agents Like A Pro

The 20-minute check I run before swapping an agent to a new model

Your agent demo works. That's the trap.

Your AI Agent Is Failing in Production

What breaks an AI agent after 50 clean demos

Is Your AI Agent Production-Ready? Define the Bar First