I asked OpenAI's infra lead where AI agents actually break. Here are 2 prompts to find out in your stack.

Nobody meant to break anything. A user asked for something routine, an agent found a path through the system, and a Kafka cluster went down. That’s the part that doesn’t show up in productivity dashboards.In the same conversation, I heard the opposite version of the same story. A user launched a training-data export job and went to sleep. The job hit a blocker. Instead of waiting for a platform engineer to notice a support ping, the agent inspected several internal systems and found a small bug several layers down. By morning, the job was done.Those two examples belong together. They show both sides of the frontier. Agents are now useful enough to participate in real operational work. They’re also capable enough to create real operational risk.That’s the next AI bottleneck. Not whether people use agents, or whether agents can code, or whether the next model wins a benchmark. We’re already past those questions. The real question is what happens when the work starts moving faster than the controls around it.This came into focus in a recent conversation with Emma, who leads data infrastructure engineering at OpenAI. Her team sits near the bottom of a very large stack: the analytics, streaming, and data infrastructure that almost every other team eventually depends on.That vantage point matters. If you’re on an application team, you mostly see whether agents help you build faster. If you’re on a platform team, you see what happens when everyone above you starts building faster.Agents make work happen faster. They don’t make it safer at the same rate.Here’s what’s inside:Agents crossed into operations. They’re not writing code for humans to paste anymore. They run release loops and build features end-to-end while a human watches.The work lands somewhere. When app teams accelerate, platform teams inherit the operational burden nobody budgeted for.Platform agents play by different rules. Same model, very different blast radius, and the tooling most companies have doesn’t know the difference.The practical control layer. Four things that let you absorb agent-created work without becoming the permanent bottleneck everyone routes through.The eval discipline most companies skip. A cheap way to know when an agent is ready for more autonomy and when to pull it back.Two prompts that build the documents. A private eval suite and a tiered action-class policy, both built from your own systems, not a template.Let me show you where this is headed, and what to do about it.

I asked OpenAI's infra lead where AI agents actually break. Here are 2 prompts to find out in your stack.

I asked OpenAI's infra lead where AI agents actually break. Here are 2 prompts to find out in your stack.

Other newsrooms on this story

Related reading

How I Fixed OpenAI Assistants API Timeout Errors in Production

I Monitored 10,000 AI API Calls. Here's What Went Wrong.

AI Security for AI Engineers: What Actually Breaks in Production? | Towards AI

Your AI Agent Is Leaking Data Right Now — And Every Tool Call Looks Safe

Autonomous Agents: what breaks first (and why that's the real product)…

I let an AI handle an outage. It invented a hack that never happened, then…

Other newsrooms on this story

Related reading

How I Fixed OpenAI Assistants API Timeout Errors in Production

I Monitored 10,000 AI API Calls. Here's What Went Wrong.

AI Security for AI Engineers: What Actually Breaks in Production? | Towards AI

Your AI Agent Is Leaking Data Right Now — And Every Tool Call Looks Safe

Autonomous Agents: what breaks first (and why that's the real product)…

I let an AI handle an outage. It invented a hack that never happened, then…