Debugging multi-agent AI: When the failure is in the space between agents

I’ve been building a multi-agent research system. The idea is simple: give it a controversial technical topic like “Should we rewrite our Python backend in Rust?”, and three agents work on it. An Advocate argues for it, a Skeptic argues against, and a Synthesizer reads both briefs blind and produces a balanced analysis. Each agent has its own model, its own tools, its own system prompt.

It worked great in testing. Then I noticed the Synthesizer kept producing analyses that leaned heavily toward one side. Not wrong, but noticeably lopsided. I mean, rewriting the Sentry monorepo in Rust is arguably a bad idea, but it was arguing against on things where I clearly knew it should be for it.

I eventually traced it to the Skeptic’s web_search tool. The Advocate was returning 3-4 solid data points per query. The Skeptic, however, was searching for different terms that didn’t match the data as well, and was getting back a single generic result. So the Advocate’s brief was well-sourced with citations, and the Skeptic’s brief was… vibes. The Synthesizer did what any reasonable reader would do: it weighted the better-sourced argument more heavily.

The bug was in a tool call, inside one agent, that silently degraded the input to a completely different agent two steps later. I only found it by clicking through the trace and reading tool outputs at each step.

Debugging multi-agent AI: When the failure is in the space between agents

Debugging multi-agent AI: When the failure is in the space between agents

Related reading

Scaling your observability for multi-agent AI systems

From Solo Agents to Team Orchestration: Making Multiple AI Agents Actually Work…

Multi-Agent Systems: Powerful Idea, Easy To Overcomplicate

The Agent Stack™: Why Your AI Agent Breaks in Production (A 5-Layer Debugging…

Do You Actually Need a Multi-Agent System?

Exploring multi-agent AI systems

Related reading

Scaling your observability for multi-agent AI systems

From Solo Agents to Team Orchestration: Making Multiple AI Agents Actually Work…

Multi-Agent Systems: Powerful Idea, Easy To Overcomplicate

The Agent Stack™: Why Your AI Agent Breaks in Production (A 5-Layer Debugging…

Do You Actually Need a Multi-Agent System?

Exploring multi-agent AI systems