Introduction

We shipped our first 10-robot demo and thought the hard part was solved. Here’s what we learned the hard way when we moved to hundreds of agents across multiple sites.

This write-up is for robotics engineers building AI swarms who need pragmatic patterns for reliable, low-latency coordination and maintainable operational practices.

The Trigger

Everything looked fine in the lab. Latency was low, commands were acknowledged, and logs said 'success'.