Search bug or model bug - testing a RAG system to tell them apart

I'm an automation tester. Usually my job is simple: the same input should give the same output, every time. Language models don't work that way. Ask the same question twice and you can get two different answers, and both can be right.

A RAG system - retrieval-augmented generation - makes it harder still. It searches your own documents and has a model write the answer from what it finds (chat with your PDF, or a support bot answering from a company's help pages). So a wrong answer has two possible causes: the search picked the wrong page, or it picked the right page and the model still got it wrong. To the user these look the same. But they're different problems with different fixes. If your tests can't tell them apart, you don't know which half to fix.

So I built a small RAG system and a test suite built to tell the two apart.

Repo: https://github.com/sbezjak/llm-rag

What it is

So I built a small RAG system and a test suite built to tell the two apart.

Repo: https://github.com/sbezjak/llm-rag

What it is

Search bug or model bug - testing a RAG system to tell them apart

Search bug or model bug - testing a RAG system to tell them apart

Related reading

Most RAG Problems Are Retrieval Problems. Here Are 8 Fixes That Worked for Me

RAG Evaluation Checklist for AI SaaS: Catch Bad Answers Before Users Do

How to make AI answer questions about your documents, by building RAG from…

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

RAG Is Not Always the Answer Anymore: How AI Agents Search Code in 2026

Build a RAG Chatbot From Scratch in About 40 Lines of Python

Related reading

Most RAG Problems Are Retrieval Problems. Here Are 8 Fixes That Worked for Me

RAG Evaluation Checklist for AI SaaS: Catch Bad Answers Before Users Do

How to make AI answer questions about your documents, by building RAG from…

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

RAG Is Not Always the Answer Anymore: How AI Agents Search Code in 2026

Build a RAG Chatbot From Scratch in About 40 Lines of Python