If you build anything that touches user generated text, sooner or later someone asks: can we just detect the AI written stuff and filter it out? I spent a while putting tools like GPTZero, Originality.ai, Copyleaks and Turnitin through their paces. Here is the short version of what I found.
Detection is a probability game, not a yes or no
Every detector outputs a likelihood, not a verdict. Under the hood most of them lean on perplexity (how predictable the next token is) and burstiness (how much sentence length and structure vary). Machine generated text tends to be smooth and low perplexity. Human text tends to be lumpy. That signal is real, but it is statistical, and statistics produce false positives.
The false positive problem is worse than the marketing admits
The failure mode that actually hurts people is flagging genuine human writing as AI. It hits two groups hardest:








