Every team we talk to has a version of the same story. They built an LLM integration that works well in testing. Then, three weeks into production, something comes back slightly different — the model wraps the JSON in a code block, or uses "status": "Completed" instead of "status": "complete", or includes an extra key that breaks the downstream parser. The whole pipeline falls over.

This post is about how we handle that problem — specifically, how we use structured outputs to get reliable, typed data from LLMs in production Django applications, and where the approach still has limits.

The problem with parsing free-text LLM responses

When you ask an LLM to "return JSON", it usually does. Until it doesn't.

The failure modes are predictable once you've seen them enough times: