The failure mode you don't see
Most agentic systems fail silently. The agent picks the wrong tool, invents a fact, agrees with the user when it shouldn't — and you find out three releases later when a customer points it out. There's no exception to log; the system is generating plausible-looking text.
The standard answer is "we'll add evals." Then evals become a number on a dashboard, the dashboard becomes a vanity metric, and the agent keeps drifting. The wrong frame is not "we need more evals" — it's that the agent has no skin in the game during a single turn.
This post is about a runtime that gives the agent skin in the game on every turn. 23 invariants, deontic tags, counter-clauses, risk-weighted sampling, an adversarial probe set, and a pre-registered statistical decision rule for whether a config change ships. None of the components are original. The combination is what makes it work.
The co-system framing






