When an AI agent fails in production, the first instinct is usually to tweak the prompt and rerun the workflow.

That can make the incident harder to understand.

The rerun may change the model output, retrieved context, tool state, timing, permissions, or external API response. If the agent already sent an email, issued a refund, changed a ticket, or called an MCP tool, a naive rerun can also repeat a side effect.

A better workflow starts by preserving evidence from the failed run before changing anything.

This checklist is for developers debugging production AI agents that use tools, retrieval, memory, workflows, or external APIs. The goal is not to make every run deterministic. The goal is to find the first unsupported decision and turn the failure into a replayable regression.