Most AI app demos make the same mistake:

they treat the model like the application.

Prompt in, answer out, maybe a few tool calls, then everybody acts surprised when the thing becomes weird in production.

The problem is not that the model is useless.

The problem is that we keep asking a probabilistic component to behave like deterministic workflow code.