Build reliable long running agents w/ verification, worktrees, skills, subagents, & HIL/review gates

There’s been a lot of buzz and discussion around loops and workflows in the past few days. There have also been many people chiming in that they’ve been doing this all along, posts about how it works, but less code examples that work for production codebases. At the end of the day, the goal is really around making long running agents reliable and steerable inside of real codebases. This article shows how to do that and how to build your own with the actual code provided.

You’ll see how an engine was built to power these loops or workflows. These terms are used interchangeably because it doesn't matter what word is used to churn hype, it's about how it works and what result it drives for agent reliability. You’ll also see the path to this architecture and why it was built the way it was and then the code (skip to whatever part you want).

There’s been good work to properly define what a loop is, but how do you just build one, and how do you build it in such a way that it doesn’t burn tokens and is reliable? @mvanhorn had a good writeup on the history of the concept, how developers are employing it, and how there is still such a major gap between AI used in real-world deployments because you do need a “production” version of a loop:

Build reliable long running agents w/ verification, worktrees, skills, subagents, & HIL/review gates

Related reading

Our agent loops have been shipping production features for weeks. Here's the…

Building Agentic Workflows in Python

What Makes An Agent Loop Useful?

Agentic AI Orchestration: The Architecture Patterns That Actually Work at Scale

I built an AI dev team that reviews its own work — here's what I learned about…

Why everyone is talking about loop-engineering and how is it changing agentic…