Introduction

Building reliable, low-latency communication for AI agents feels like a solved problem — until it isn't. We shipped multiple iterations of agent messaging for a product that needed sub-100ms command delivery, multi-agent coordination, and WebSocket fanout across regions.

Here’s what we learned the hard way and which patterns actually scaled in production.

The Trigger

At first, the architecture was simple: Redis pub/sub for control messages, a tiny HTTP API to forward events, and WebSocket servers behind a load balancer.