How I stopped dumping PDFs and started chatting with my documentation

A few months ago I was drowning in documentation. My team had written hundreds of pages about our internal microservices, configuration guides, and deployment procedures. Great, right? Except that nobody read them. The same questions popped up in Slack every week. "How do I reset the staging DB?" "What's the syntax for that webhook?"

I tried throwing a basic search index on top of the wiki. It was terrible. People would type "reset staging database" and get back a page about resetting production credentials. Context? Gone. Synonyms? Useless.

So I did what any developer would do: I spent two weekends building a RAG (Retrieval-Augmented Generation) system from scratch. Here’s what I learned, including the dead ends that wasted my time.

The naïve approach: dump PDFs into a vector database

I started with the classic recipe: PDFs → text splitter → OpenAI embeddings → Pinecone. Simple. It worked... for one question. For everything else it returned irrelevant junk.

So I did what any developer would do: I spent two weekends building a RAG (Retrieval-Augmented Generation) system from scratch. Here’s what I learned, including the dead ends that wasted my time.

The naïve approach: dump PDFs into a vector database

I started with the classic recipe: PDFs → text splitter → OpenAI embeddings → Pinecone. Simple. It worked... for one question. For everything else it returned irrelevant junk.

How I stopped dumping PDFs and started chatting with my documentation

How I stopped dumping PDFs and started chatting with my documentation

Related reading

How I Built a Q&A Bot for My Documentation (and What I Learned)

Why I Stopped Building My Own Document Q&A from Scratch

I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)

How I Used Semantic Search to Stop Drowning in API Docs

Developer Documentation Platforms in 2026: GitBook, Mintlify, ReadMe,…

Nobody Reads My Docs Anymore—Not Even the AI Agents

Related reading

How I Built a Q&A Bot for My Documentation (and What I Learned)

Why I Stopped Building My Own Document Q&A from Scratch

I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)

How I Used Semantic Search to Stop Drowning in API Docs

Developer Documentation Platforms in 2026: GitBook, Mintlify, ReadMe,…

Nobody Reads My Docs Anymore—Not Even the AI Agents