Most agent architectures are quietly broken. They look great in a demo — single turn, clean task, instant response — and then fall apart the moment you ask them to do anything real. Like process a week of insurance claims. Or run a multi-day sales sequence. Or reconcile data across systems that don't share a clock.

The reason is simple, and nobody talks about it: most agents are stateless under the hood. They reconstruct context from scratch on every interaction. The reasoning chain that made the last decision make sense? Gone. The soft signals, the confidence gradients, the partial progress? All gone. You get a polite LLM that pretends to know what's going on.

Addy Osmani and Shubham Saboo from Google Cloud just published five patterns for fixing this. I read the whole thing. Here's what actually matters.

The Five Patterns (In My Words)

1. Checkpoint-and-Resume. Treat your agent like a long-running server, not a request handler. Checkpoint progress every N units of work — not every unit (wasteful), not just at the end (risky). If your agent dies on document 201 of 1,000, you resume at 201, not at zero. The code sample in the article checkpoints every 50 docs, which is a reasonable default. Adjust based on how expensive each unit is.