I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)

A few months ago, I decided to build a Q&A bot for my project’s documentation. You know the dream: users type a question, and the bot answers instantly from the docs. No more digging through pages. No more stale FAQs.

I thought it would be straightforward. Slap an LLM on top of a text file and call it a day. Oh, how wrong I was.

The Problem That Nearly Broke Me

I had a bunch of Markdown files – about 50 pages of setup guides, API references, and troubleshooting. I wanted the bot to answer questions like “How do I configure authentication?” or “What’s the maximum payload size?”

My first attempt: dump the entire documentation into a single prompt and ask GPT-4 to answer. It worked… for the first two questions. Then I hit the token limit. Then I realized I was spending $0.50 per query. Then I noticed the model hallucinating answers from unrelated sections.

I thought it would be straightforward. Slap an LLM on top of a text file and call it a day. Oh, how wrong I was.

The Problem That Nearly Broke Me

I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)

I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)

Related reading

How I Built a Q&A Bot for My Documentation (and What I Learned)

Why I Stopped Building My Own Document Q&A from Scratch

Building a Document Q&A Bot: Why Embeddings Are Trickier Than They Look

My Support Bot Kept Making Stuff Up — Here's How I Fixed It

I replaced 1,000 lines of Python with a 500-word prompt

How I stopped dumping PDFs and started chatting with my documentation

Related reading

How I Built a Q&A Bot for My Documentation (and What I Learned)

Why I Stopped Building My Own Document Q&A from Scratch

Building a Document Q&A Bot: Why Embeddings Are Trickier Than They Look

My Support Bot Kept Making Stuff Up — Here's How I Fixed It

I replaced 1,000 lines of Python with a 500-word prompt

How I stopped dumping PDFs and started chatting with my documentation