Everyone reaching for a vector database when building RAG is solving the wrong problem first. For most domain-specific corpora — technical documentation, company knowledge bases, article archives — BM25 retrieval is competitive with semantic search, costs a fraction of the compute, and is dramatically simpler to operate. This tutorial shows you how to build a full RAG pipeline using Meilisearch as the retrieval backend, stream responses from an LLM API, and evaluate hit rate without a single embedding model.

Why RAG, and why not a vector database

Retrieval-Augmented Generation solves a fundamental problem: LLMs have a knowledge cutoff and a finite context window. You want answers grounded in your documents, not hallucinated from pre-training.

The standard advice is to use a vector database (Pinecone, Weaviate, Chroma). Vector search is powerful for open-domain retrieval where semantic similarity matters. But on a domain-specific corpus with consistent terminology — think a cybersecurity knowledge base or a medical reference — BM25 with typo tolerance typically achieves 85–95% of the recall you'd get from embeddings, with zero GPU cost, sub-10ms latency, and no embedding pipeline to maintain.