We've been generating production features with AI for a while now — auth flows, billing hooks, notification handlers. And we've hit a pattern we don't have a good answer to yet.
The first feature the AI generates looks great. It reads the codebase, picks up the patterns, and the output looks like something a senior dev wrote.
The tenth feature? Less so. Small inconsistencies creep in:
A handler that doesn't follow the error-handling convention
A schema field with a different naming pattern






