I wanted to ask questions about my own papers without shipping them to a cloud API. This is the real story of building that — a private, fully-offline RAG with hybrid retrieval and reranking — across a pile of old GPUs and one newer one. Three things each cost me the better part of a day, and none of them were what I expected.
The goal: a private RAG over my own papers
I'm a researcher with a folder of PDFs I can't (and won't) upload to a hosted API. I wanted natural-language, cited answers over that corpus, running entirely on my own hardware. So I built a small tool — paper-rag, about 200 lines of Python — with the whole stack local:
PDFs → chunk → BGE-M3 dense (Ollama) ┐
BM25 sparse (fastembed)┴→ Qdrant (embedded, on disk)






