The 3 AM tool-call loop from hell

Last month I deployed a ReAct-style agent to handle customer support triage. By 3 AM I had an alert: one user session had burned through 47,000 tokens in a single conversation. The agent had been calling the same search_knowledge_base tool 73 times in a row, with slightly different queries each time, never deciding to stop.

If you've built any kind of tool-using agent, you've probably seen this pattern. The agent gets stuck in a loop, either repeating the same action or oscillating between two actions. Tokens evaporate. Costs spike. Users wait forever for a response that never comes.

This isn't a model problem. It's an architectural problem. And once you understand what's actually happening inside the loop, the fix is straightforward.

What's actually happening inside the loop