When I set out to add an AI assistant to SmartQueue, a distributed task queue I'd already built in Go for handling IT support tickets, the obvious move was to bolt on an LLM and call it done. Type a question, get an answer. But a generic LLM doesn't know your company's password reset procedure, your P1 outage runbook, or that refunds need manager approval above $500. It needed grounding in actual internal knowledge. That's the job retrieval-augmented generation (RAG) is built for: pull the relevant facts out of your own documents first, then hand them to the model as context instead of trusting it to know your business.

This post walks through how that pipeline actually works, the architectural decision I reversed midway through (and why), the numbers I picked for things like retrieval depth and temperature, and an honest take on whether any of it counts as "real" RAG.

What the assistant actually does

SmartQueue Bot lives inside the Queue Health and AI Bot tabs of the dashboard. An agent picks a ticket, asks a question like "what are the immediate steps for this database outage," and the bot streams back an answer token by token, grounded in a small internal knowledge base of IT runbooks. The request flow looks like this: