Introduction: The Place of Large Models in RAG and Lingering Questions

Retrieval-Augmented Generation (RAG) systems extend the information retrieval capabilities of large language models (LLMs), enabling them to produce more accurate and contextually relevant responses. In this process, the quality of external data (context) directly impacts the model's output. However, the question of how "large" models are truly necessary in the RAG architecture has become a significant topic of discussion, considering factors like cost, performance, and complexity. I've had the opportunity to delve into this issue in my own projects.

Especially when working with enterprise datasets, the role of LLMs not only in processing information but also in "finding" (retrieval) this information can determine the overall success of the system. So, do we always have to use the largest, most capable model? Or can smaller, more focused models provide the same retrieval quality? In this post, I will examine whether large models are truly indispensable for improving retrieval quality in RAG systems, using concrete examples from my own experiences.

What is Retrieval Quality and Why is it Important?

Retrieval quality in a RAG system is a measure of how effectively we can find the most appropriate and accurate information for a user's query from relevant external sources. This not only involves retrieving the correct documents but also identifying the most relevant passages within those documents. If irrelevant or incomplete information is retrieved during the retrieval phase, no matter how advanced the LLM, the generated response will be inaccurate or insufficient.