How to build crash-proof, resumable AI agents with Temporal's durable execution: a DeepAgents-style developer experience where killing the process doesn't kill the run.
If you've built an AI agent that does real work (calling tools, delegating to sub-agents, looping until a task is done), you've probably felt this particular kind of pain:
The agent is nine steps into a twelve-step job. It has searched the web, written three files, and delegated to a sub-agent. Then the process dies. A deploy, an OOM kill, a dropped network connection, a transient 500 from your model provider. Whatever the cause, the result is the same: the entire run is gone. All that state lived in process memory, and process memory just evaporated.
Durability usually isn't the first thing you reach for when prototyping an agent, and for good reason: it's plumbing, not the fun part. But once an agent starts doing real work, it's worth taking seriously. This article is about a mental model that makes that durability almost free, an agent is not an object in memory, it's a durable workflow, and how you can build agents that survive crashes, restarts, and infrastructure failures by running them on Temporal.
I'll also show you a small open-source library I've been building, durable-agents, that packages this pattern so you don't have to write the plumbing yourself. But the ideas matter more than the library: you can apply them with raw Temporal, and you'll learn something even if you never touch my repo.









