We've been generating production features with AI for a while now — auth flows, billing hooks, notification handlers. And we've hit a pattern we don't have a good answer to yet.

The first feature the AI generates looks great. It reads the codebase, picks up the patterns, and the output looks like something a senior dev wrote.

The tenth feature? Less so. Small inconsistencies creep in:

A handler that doesn't follow the error-handling convention

A schema field with a different naming pattern