The novel power of today’s AI is in its ability to deal with intent. This is a superpower, no doubt, but it creates a huge imperative for app developers: the need to map between the anything-is-possible large language model (LLM) and the strict capabilities of code.
Unrestrained, LLM endpoints will let your user create unicorns and leprechauns while your back end can handle only purchase orders and customer profiles. You must harness the LLM’s ability to understand intent to what the app is logically capable of, meanwhile keeping context (and therefore spend) under control. Here I’ll discuss some practical, realistic techniques for doing that today.
Between what the user wants to do and what your app is capable of is you. Or, more specifically, the mediation layer you build. This layer can sit anywhere on a broad spectrum, from using incredibly lightweight inline strings to using a massive retrieval-augmented generation (RAG) system backed by a vector database. Somewhere in there is the sweet spot for your particular project.
It turns out there is a great deal you can do without resorting to the extra infrastructure of a vector database, and indeed, one should avoid that until it is really, truly needed. The first step in keeping your AI API’s manageable is the response schema.















