I Built RAG From Scratch in Python to Understand It. Here's What I Learned.

Every RAG tutorial I read used LangChain or LlamaIndex and hid the interesting parts. So I built a 500-line RAG pipeline with no frameworks — just pypdf, ChromaDB, and Ollama. The exercise taught me more about embeddings, chunking, and prompt design than a year of using the high-level libraries. Here's the code, the design, and the parts that surprised me.

lunedì 22 giugno 2026 New tab

I had used LangChain's RAG chain in production for six months. I could not have told you, off the top of my head, what chunk_overlap did, or why cosine similarity is the right distance metric, or how nomic-embed-text actually turns a sentence into a vector. The high-level library abstracted all of it away.

So one weekend I deleted the LangChain dependency and wrote a RAG pipeline from scratch in ~500 lines of plain Python. No framework, no magic. pypdf for text extraction. A 60-line chunker. ChromaDB for the vector store. Ollama for embeddings and the LLM. The whole thing is on GitHub — every module is under 200 lines, every test is deterministic, and you can read the whole thing in one sitting.

This is the build log. Not a tutorial — the build log, with the parts that surprised me and the parts I got wrong the first time.

Why bother

The honest reason: I was using LangChain's RetrievalQA chain and getting answers I didn't trust. Sometimes the model would say "according to the document" when the document didn't say that. Sometimes the citations were wrong. I had no way to know if the chunker was dropping important context, or if the cosine similarity was picking the wrong neighbors, or if the prompt was actually constraining the model. The library was a black box.

This is the build log. Not a tutorial — the build log, with the parts that surprised me and the parts I got wrong the first time.

Why bother

I Built RAG From Scratch in Python to Understand It. Here's What I Learned.

Other newsrooms on this story

I Built RAG From Scratch in Python to Understand It. Here's What I Learned.

Other newsrooms on this story

Related reading

I built a RAG pipeline from scratch — no LangChain, just FastAPI + FAISS

Build a RAG Pipeline From Scratch (Production Patterns That Actually Matter)

I Built a RAG App, Then Asked It What Car I Like. It Didn't Know.

Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

What is RAG? A Beginner's Guide to Retrieval-Augmented Generation (For…

# Day 5 of learning AI Engineering: built a small RAG app over a PDF

Related reading

I built a RAG pipeline from scratch — no LangChain, just FastAPI + FAISS

Build a RAG Pipeline From Scratch (Production Patterns That Actually Matter)

I Built a RAG App, Then Asked It What Car I Like. It Didn't Know.

Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

What is RAG? A Beginner's Guide to Retrieval-Augmented Generation (For…

# Day 5 of learning AI Engineering: built a small RAG app over a PDF