When a single agent does something dangerous, the audit problem is small. You have one run, one set of tool calls, one receipt stream, and one place to ask who, what, and why.
When a team of agents works on the same task, the audit problem is suddenly much harder, and the most common reaction is to glue everything together with a shared log. That is usually the wrong answer.
The thing that breaks first in a multi-agent run
In our own work, the thing that breaks first is not tool correctness. It is the handoff.
Concretely: agent A reads a ticket, plans a fix, and decides that the actual file edits should be done by agent B because B has the right tooling and a tighter permission scope. A asks B to do the edits. B does them. The user wakes up the next morning, looks at the PR, and asks "who changed this and why?"






