Building a RAG Pipeline From Scratch: What SmartQueue Taught Me About Retrieval

When I set out to add an AI assistant to SmartQueue, a distributed task queue I'd already built in Go for handling IT support tickets, the obvious move was to bolt on an LLM and call it done. Type a question, get an answer. But a generic LLM doesn't know your company's password reset procedure, your P1 outage runbook, or that refunds need manager approval above $500. It needed grounding in actual internal knowledge. That's the job retrieval-augmented generation (RAG) is built for: pull the relevant facts out of your own documents first, then hand them to the model as context instead of trusting it to know your business.

This post walks through how that pipeline actually works, the architectural decision I reversed midway through (and why), the numbers I picked for things like retrieval depth and temperature, and an honest take on whether any of it counts as "real" RAG.

What the assistant actually does

SmartQueue Bot lives inside the Queue Health and AI Bot tabs of the dashboard. An agent picks a ticket, asks a question like "what are the immediate steps for this database outage," and the bot streams back an answer token by token, grounded in a small internal knowledge base of IT runbooks. The request flow looks like this:

What the assistant actually does

Building a RAG Pipeline From Scratch: What SmartQueue Taught Me About Retrieval

Building a RAG Pipeline From Scratch: What SmartQueue Taught Me About Retrieval

Related reading

I Built a RAG App, Then Asked It What Car I Like. It Didn't Know.

Your RAG Pipeline Hallucinates Because It Never Checks Its Own Work

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

Why my first RAG system hallucinated (and how I fixed it)

Building a Production RAG Pipeline with LlamaIndex and Pinecone

AutoRAG vs RAGBuilder vs Red Hat AutoRAG: Which RAG Pipeline Wins on YOUR Data…

Related reading

I Built a RAG App, Then Asked It What Car I Like. It Didn't Know.

Your RAG Pipeline Hallucinates Because It Never Checks Its Own Work

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

Why my first RAG system hallucinated (and how I fixed it)

Building a Production RAG Pipeline with LlamaIndex and Pinecone

AutoRAG vs RAGBuilder vs Red Hat AutoRAG: Which RAG Pipeline Wins on YOUR Data…