Nearly 65% of enterprise AI failures in 2025 traced back to context drift or memory loss during multi-step reasoning. Not model capability issues. Not hallucinations from weak training data. The agent simply lost track of what it was doing because its context window filled up with conversation history from other agents.
The intuition most teams follow: bigger context window, better agent. Feed everything in. Let the model sort it out. The research says otherwise. Chroma's "Context Rot" study confirms performance degrades as input token count grows across every major model. More tokens in the window means worse decisions, not better ones.
For multi-agent systems, this problem compounds quadratically. Every agent-to-agent exchange adds tokens to both sides. A 5-agent pipeline sharing context accumulates conversation history faster than any context window can sustainably hold.
The Computer Architecture Parallel
A recent arxiv position paper reframes multi-agent memory as a computer architecture problem. The insight: agents communicating through context windows is equivalent to CPUs sharing data through registers. It works for trivial cases. It collapses at scale.












