AI agents are moving from chat and summarization into the systems where mistakes are expensive: purchasing, vendor management, inventory, invoicing, close workflows, approvals, and internal ops.

That shift changes the QA problem. A normal integration test can tell you whether an API call worked. It cannot tell you whether an autonomous workflow should have acted, paused, escalated, or created a durable audit trail.

If your product is an agentic ERP, finance-ops copilot, accounting close agent, procurement agent, or any AI workflow that changes business state, I would add a release gate that answers five questions before every new capability goes live.

1. Did the agent preserve the permission boundary?

The highest-risk failure is not a hallucinated sentence. It is a correct-looking action performed by the wrong actor.