A client called us last month with a simple complaint: "Our support agent confidently quotes the wrong refund policy." The model was fine. The prompt was fine. The problem was three layers down, in the part nobody demos: retrieval. The agent was pulling the wrong chunk of text and then reasoning beautifully over the wrong facts.
This is the quiet truth about Retrieval-Augmented Generation (RAG). When an agent gives a wrong answer, the instinct is to blame the model or "prompt it harder." But in production, the majority of bad answers we debug are retrieval failures, not generation failures. The model did exactly what it was told - it just got handed the wrong context. Here are the five failure modes we see most often, and how we fix them.
1. Chunking that splits a fact in half
The default move is to slice documents into fixed 500-token windows. That works until a fact straddles a boundary - the eligibility rule is in chunk 14, the exception that voids it is in chunk 15, and your retriever returns only chunk 14. The agent now states a rule with total confidence and zero awareness of the exception.
The fix: chunk on structure, not character count. Split on headings, table rows, clauses, and list items. Add a small overlap (10-15%) so a fact and its caveat never get cleanly severed. For policy and contract data, we often store the whole section as one chunk even if it is long - a slightly bloated context beats an amputated fact.







