What I learned testing AI text detectors in 2026 (they still get it wrong)

If you build anything that touches user generated text, sooner or later someone asks: can we just detect the AI written stuff and filter it out? I spent a while putting tools like GPTZero, Originality.ai, Copyleaks and Turnitin through their paces. Here is the short version of what I found.

Detection is a probability game, not a yes or no

Every detector outputs a likelihood, not a verdict. Under the hood most of them lean on perplexity (how predictable the next token is) and burstiness (how much sentence length and structure vary). Machine generated text tends to be smooth and low perplexity. Human text tends to be lumpy. That signal is real, but it is statistical, and statistics produce false positives.

The false positive problem is worse than the marketing admits

The failure mode that actually hurts people is flagging genuine human writing as AI. It hits two groups hardest:

Detection is a probability game, not a yes or no

The false positive problem is worse than the marketing admits

The failure mode that actually hurts people is flagging genuine human writing as AI. It hits two groups hardest:

What I learned testing AI text detectors in 2026 (they still get it wrong)

What I learned testing AI text detectors in 2026 (they still get it wrong)

Other newsrooms on this story

Related reading

Should you trust AI text detectors? | Explained

Authors Guild test finds some AI detectors perfectly identify human writing…

I Built an AI Text Detector from Scratch — Here's What I Learned About Doing It…

It's starting to look like we'll never come up with a good way to tell what was…

‘Inconsistent’ AI detection ‘should prompt assessment rethink’

AI content detection script Python: how I test against 3 detectors at once

Other newsrooms on this story

Related reading

Should you trust AI text detectors? | Explained

Authors Guild test finds some AI detectors perfectly identify human writing…

I Built an AI Text Detector from Scratch — Here's What I Learned About Doing It…

It's starting to look like we'll never come up with a good way to tell what was…

‘Inconsistent’ AI detection ‘should prompt assessment rethink’

AI content detection script Python: how I test against 3 detectors at once