How 12 AI agent frameworks handle human approval (most badly)

I keep watching teams ship agent systems into production and then discover, on day three, that "the agent needs to wait for a human sometimes" breaks every assumption in their stack. Not because they didn't see it coming, every team plans for HITL. Because every popular agent framework reduces "human in the loop" to "block the Python process on input() and hope for the best."

I spent a day auditing the twelve most popular AI-agent frameworks against a strict production rubric. The results aren't kind. Two frameworks pass. Ten are one production deploy away from breaking.

This post is the receipts.

The rubric (and why it's strict)

A production HITL primitive isn't "the agent can pause for input." That's a 1980s primitive. A production HITL primitive needs six properties:

I spent a day auditing the twelve most popular AI-agent frameworks against a strict production rubric. The results aren't kind. Two frameworks pass. Ten are one production deploy away from breaking.

This post is the receipts.

The rubric (and why it's strict)

A production HITL primitive isn't "the agent can pause for input." That's a 1980s primitive. A production HITL primitive needs six properties:

How 12 AI agent frameworks handle human approval (most badly)

Other newsrooms on this story

How 12 AI agent frameworks handle human approval (most badly)

Other newsrooms on this story

Related reading

Your AI Agent Passed All Tests — Then Failed in Production. Here's the…

Your AI Agents Ship Code Faster Than You Can Review It. Here's the Workflow…

The Reliability Problem That Forced Us to Rethink AI Agents

The Agent Stack™: Why Your AI Agent Breaks in Production (A 5-Layer Debugging…

The Right Agent at the Right Time

Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+…

Related reading

Your AI Agent Passed All Tests — Then Failed in Production. Here's the…

Your AI Agents Ship Code Faster Than You Can Review It. Here's the Workflow…

The Reliability Problem That Forced Us to Rethink AI Agents

The Agent Stack™: Why Your AI Agent Breaks in Production (A 5-Layer Debugging…

The Right Agent at the Right Time

Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+…