When AI Hosts Hallucinate: Failure Modes We've Seen and How Three-Tier Review Catches Them
By the KAVANA engineering team — June 2026
The word hallucination in AI discussions almost always refers to factual hallucination: a language model that confidently states something false. A chatbot that invents a court case. A research assistant that fabricates a citation. This failure mode is real, broadly understood, and the subject of considerable engineering effort.
In broadcast, we deal with a different class of AI failure that is less widely discussed because it does not exist outside of speech synthesis: acoustic and prosodic hallucination. The AI host who says accurate words in a voice that makes them sound wrong. The synthesis that renders a year as a sequence of digits instead of a phrase. The prosody that collapses on a sentence with three embedded clauses and emerges sounding like it was read by someone who did not understand it. The speaker disfluency that appears from nowhere in the middle of an otherwise clean segment.
These failures do not show up in a text transcript. They cannot be caught by a factual verification pipeline. They require someone to listen, or a system designed specifically to detect acoustic anomalies, or both. This post describes the acoustic and prosodic failure modes we have encountered in production, how they interact with factual failures, and how the three-tier review architecture we use catches them at different stages.






