The next serious agent failure won’t look like a jailbreak. It’ll look like an email sent because the thread seemed to imply approval, a customer record updated because the old value looked stale, a pull request opened because the tests passed and the change looked done. None of that requires the model to misbehave, which is what makes it hard. The risk starts where the product gets useful: when language turns into action.A chat demo lives in suggestion space. The model drafts, summarizes, answers, proposes, and if it’s wrong, the user rejects it. The cost is local. A production agent lives closer to consequence: it can notify someone, expose private information, change a shared record, trigger a workflow, or spend money. That moves a question to the center of the product demos never had to answer: who decides whether the agent should be allowed to act?A better prompt doesn’t really answer it. Telling the model to “be careful” doesn’t either. Approval modals technically reduce risk but ruin the workflow. Users either click through out of habit or stop using the system. The answer that’s actually working is architectural: a separate judge wrapped around the actor, deciding whether each proposed action should move forward. If you’re building agents that act, this is the layer of the product you cannot bolt on later.Here’s what’s inside:The Lindy example. How a multi-channel agent product hit the failure mode every production system eventually faces, and the architectural fix that worked.Why prompting and approval modals both fail. The structural reasons a single prompt can’t pursue a task and police it at the same time.Orchestration is not judgment. Why coordinating agents and judging their actions are different problems with different homes in the stack.The builder toolkit. Action classification, proposals, specialist judges, eval, memory governance, and what to build first.The OpenBrain Judge Extender guide + the prompt kit that builds your first judge. Five prompts that take you from “my agent acts” to a working judge at your highest-risk boundary, plus the full implementation spec for wiring that judge to durable memory, provenance, and structured write-back so it doesn’t start every session from zero.Start with the team that hit this wall publicly and figured out what to do about it.