There's a question every system runs into the moment it goes to production and starts doing real things: what exactly happened, in what order, against what data — and can you prove it?

AI is just making that question very loud right now. Picture the case that gets more likely with every tool-using agent: a support agent — not a human, an LLM with tool access — cancels a subscription, issues a refund, fires off three follow-up emails. The next day the customer says: I never cancelled. Now answer the question above.

In most codebases the honest answer is: you can see the current state of the database (subscription cancelled), but not the path that got it there. A few log lines the next refactor will overwrite. No reliable record of which actor acted on behalf of which customer. And undoing it means hand-writing a correction and hoping you catch every side effect.

That's not a model problem. GPT wasn't "wrong." The problem sits one layer down.

The AI part is now the easy part