You've tested your agent dozens of times. It works in your dev environment. You ship it. Then your first real user triggers a confabulated answer, a wrong tool call, or an action the agent was never supposed to take.
The instinct is to blame the model. Swap GPT-4 for Claude, or try Gemini, or fine-tune something. But in most production failure post-mortems, the root cause isn't the model's weights — it's the information the model was given when it had to make a decision.
That's a context design problem. And it's solvable.
What "Hallucination" Actually Means in an Agent System
The word "hallucination" gets overloaded. It covers at least three distinct failure modes that require different fixes:









