Standard RAG retrieves once and hopes for the best. Agentic RAG retrieves, reflects, decides it was wrong, and tries again — without being told to.

Single-pass RAG has a fundamental flaw: it commits to its first retrieval attempt and generates forward regardless. It has no mechanism to check whether the retrieved chunks actually contain the answer. This works for simple factual queries. It breaks on multi-hop questions, ambiguous intent, and analytical queries requiring sequenced lookups.

The Architecture

An agentic RAG system treats retrieval as a tool available to a reasoning loop. The LLM decides what to retrieve, evaluates what came back, and determines when to stop.

The key component: a reflection agent sits between retrieval and generation. It evaluates the quality and sufficiency of accumulated context and either terminates the loop or sends it back with a refined query.