Building a multi-agent document-search copilot — Part 1: muddy results, and one strategy per query

The first version ranked documents badly — and worse, it ranked them badly in a way that looked fine on the architecture diagram. Those are the bugs that get under my skin: every box is green, every arrow points the right way, and the answer is still wrong.

We were building a chat copilot over a regulated document store — the kind where a user types "show me my effective SOPs about equipment cleaning" and expects the right handful of documents back, ranked, with an excerpt and a reason. The v1 design did the obvious thing: run two retrieval lanes in parallel — a structured metadata lane and a semantic content lane — union the hits, rerank the union, render. Clean diagram. Muddy results. We'd open the demo, the pipeline would light up green end to end, and the list that came back was mush: the metadata rows polluted the semantic rank, the relevance scores stopped meaning anything, and there was no clean ordering left to show the user. The architecture was elegant. The experience was not.

This is a two-part story of how that became v2: one strategy per query, never mixed, a router that's a single structured-output call, and a Hybrid path that peeks at the data before it decides how to retrieve. It's an architecture post, so I'll keep it anchored in the specific decisions that actually moved — not a generic "how to build RAG" walkthrough. Part 1 (this post) is the problem and the first two reframes. Part 2 is the hard case — Hybrid — and the permission model.

Building a multi-agent document-search copilot — Part 1: muddy results, and one strategy per query

Building a multi-agent document-search copilot — Part 1: muddy results, and one strategy per query

Related reading

Your Agent Checked Everything. It Was Still Wrong.

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

Why I Stopped Organizing AI Agents by Role (and Built a Document Exchange…

Why Your Agent's Search Results Look Right and Are Wrong: The Index…

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually…

Multi-Agent Systems in Production: When One Agent Isn't Enough and How We…

Related reading

Your Agent Checked Everything. It Was Still Wrong.

Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search…

Why I Stopped Organizing AI Agents by Role (and Built a Document Exchange…

Why Your Agent's Search Results Look Right and Are Wrong: The Index…

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually…

Multi-Agent Systems in Production: When One Agent Isn't Enough and How We…