An agent in my homelab posted "HEARTBEAT_OK" to the ops channel 47 times over one weekend. Every message was technically correct. The scheduled jobs were healthy, the agent verified them, and it reported in exactly like it was told to. By Monday morning I had muted the channel, which meant the one message that mattered (a failed backup verification) scrolled past unread sometime around 3 AM.

That incident wasn't an alignment problem or a runaway loop. It was a stopping problem. The agent had no concept of "nothing to say," so it said something every time it woke up. Most agent safety writing focuses on preventing harmful actions. In practice, the boundary I've had to engineer most carefully is more mundane: teaching agents when to do nothing and exit quietly.

If you run scheduled agents, autonomous loops, or anything where an LLM makes decisions on a timer, this is for you. The patterns below come from running multi-agent pipelines on my own infrastructure, and from the specific ways they've failed. I covered the theory in Three-Layer Safety for Autonomous Agents; this post is the operational follow-up, the part where theory meets a crash-looping gateway at 2 AM.

Stopping is a feature, not a failure state