i run 35 specialized claude code agents across my projects. most of whats written about AI agents in 2026 is either marketing (look how much they can do) or doom (look how much theyll replace). both miss the practical layer: where do these agents consistently fail, even with the best prompts, the best context, the best tools?
this is that list. drawn from running these agents across 3 production codebases for the last 6 months. specific failures, not abstract concerns.
judgment under partial information
biggest single category. AI agents fail when the right action requires waiting, choosing not to act, or saying "i need more info."
client message: "can you make the dashboard faster?" agent reads the request, looks at the dashboard code, identifies three optimization opportunities, starts implementing. senior reads the same message, asks: "faster for whom? on what data volume? slow on initial load or on filter operations? whats the SLA?"












