Most hallucination detection approaches tell you to train another model. I did not want to do that. I used four statistical signals, a combined score, and a tunable threshold. No fine-tuning. No GPU. No external API. Tested on 10,000 real examples from the HaluEval dataset.
Soft flag result: precision 0.71, recall 0.96.
Strict flag result: precision 1.00, recall 0.38.
Here’s how it works.
Why Not Just Use a Model?











