Everyone is talking about hallucination. That's the wrong diagnosis.

Hallucination isn't a bug. It's the mechanism. Turn the temperature down far enough and the model stops confabulating, but it also stops being useful. What people call hallucination is what LLMs do when their creativity fails in a context that needed correctness. But all LLM output is hallucination in some form: generated token by token, probabilistically, without ground truth — just with enough structure and guardrails that most of it lands close enough to be acceptable.

The creativity and the confabulation are the same thing. Different temperature, different context, different guardrails.

Which means "reducing hallucination" is the wrong goal. You can't reduce it without reducing the model. The right goal is routing: giving LLMs only the problems where that generative, probabilistic process is actually what you need.

Using a non-deterministic tool where a deterministic one would do the job perfectly is what breaks agentic systems.