Originally published on lavkesh.com
I've watched it happen dozens of times and I've done it myself more than once. You pick a use case, connect it to a model, write a prompt, feed in some sample data, and it works. Not just works. It's impressive. You show it to stakeholders and the energy in the room is real. Someone says 'this is exactly what we needed.' Someone else asks how fast you can ship it.
Six months later, the team is rebuilding it from scratch. Not because the idea was wrong. Because the thing that made the demo work is not the same thing that makes a production system work, and nobody designed for the difference.
The first thing that breaks is evaluation. In the demo, evaluation is the person running the demo. You look at the output, it looks right, you move on. In production, nobody is watching every output. You need automated evaluation, and you need to have designed for it from the start, which means you needed to define what 'good' looks like before you started building.
The second thing that breaks is the prompt. Prompts in demos are written to work on the examples you have. They have not been tested against the distribution of actual user inputs, which is always stranger and more varied than whatever you planned for. The first week of real usage surfaces things no demo could have predicted.






