Remember being excited (or dreading, depending on the stage of your career and the company you worked at) about writing unit tests? Or sweating all the details in your end-to-end and integration tests you were sure covered all the use cases your users would hit?

These days a lot of UIs are slowly being replaced by a single input field and an agent that promises to deliver the same value a UI would, but with the elegance and pun-ness of a “Jarvis”.

We craft their SOUL.md and their MEMORY.md and the system prompt. We pretend we know what we’re doing setting up evals with prompts we know are not how our users will interact with the agent, but we set the threshold and the confidence score comes back satisfactory and we approve and deploy. Job’s done, right?

Not quite.

Sentry is attending AI Engineer World’s Fair this week and I decided to build a little schedule builder with an agent to help people put together their itineraries. (Shout out to Swyx for providing the data and even the embeddings for all the speakers, talks and tracks.)