When Agents Should Stop: Designing Safety Boundaries That Work

An agent in my homelab posted "HEARTBEAT_OK" to the ops channel 47 times over one weekend. Every message was technically correct. The scheduled jobs were healthy, the agent verified them, and it reported in exactly like it was told to. By Monday morning I had muted the channel, which meant the one message that mattered (a failed backup verification) scrolled past unread sometime around 3 AM.

That incident wasn't an alignment problem or a runaway loop. It was a stopping problem. The agent had no concept of "nothing to say," so it said something every time it woke up. Most agent safety writing focuses on preventing harmful actions. In practice, the boundary I've had to engineer most carefully is more mundane: teaching agents when to do nothing and exit quietly.

If you run scheduled agents, autonomous loops, or anything where an LLM makes decisions on a timer, this is for you. The patterns below come from running multi-agent pipelines on my own infrastructure, and from the specific ways they've failed. I covered the theory in Three-Layer Safety for Autonomous Agents; this post is the operational follow-up, the part where theory meets a crash-looping gateway at 2 AM.

Stopping is a feature, not a failure state

When Agents Should Stop: Designing Safety Boundaries That Work

Related reading

Your AI agent says "done." Who checks that from outside the agent?

The failure that looks like success: how an unattended agent loop dies

Loop Engineering: Stop Failed Successfully

Reliable AI Agent Control Flow: Keep the State Machine Out of the Prompt

How Do You Contain an AI Agent Failure You Can't Prevent?

Your agent stops obeying your rules halfway through the session. Here's the…