‘Inconsistent’ AI detection ‘should prompt assessment rethink’

The minor use of large language models (LLMs) by students in their work may be overstated by artificial intelligence (AI) detection tools, according to a paper.

At the same time, the research suggests, the tools may be undercounting a heavier reliance on programs such as ChatGPT.

For the study, published in Education and Information Technologies, researcher Lucky E. Atamhenwan fed 81 sample essays into Turnitin. The scripts ranged from those that were 100 per cent LLM-generated – either by ChatGPT, Copilot or Gemini – to those written solely by people.

Turnitin did not flag any of the essays that were 100 per cent human written as being generated by AI.

And in every instance in which the detector flagged AI-generated words, it was indeed due to the presence of LLM-generated work in those samples.

‘Inconsistent’ AI detection ‘should prompt assessment rethink’

Other newsrooms on this story

Related reading

What Building an AI Detector Taught Me About False Positives

How AI Text Detectors Actually Work (And Why They Flag You)

Universities are relying on AI-detection software to catch cheating. How well…

Most students report AI errors but only half often check content

Should you trust AI text detectors? | Explained

Pupils struggle to tell if AI content is true, report says