In the previous post, we wrote about a very small failure mode:

an AI operator said a task was done, but nothing actually existed on disk.

That sounds like a bug in one workflow. For us, it became a larger operating problem.

The issue is not just whether an agent can finish a task

Most agent tooling focuses on one of three questions: