A new viewpoint article published in JMIR Mental Health warns that artificial intelligence (AI) systems used in mental health settings may inherit and reinforce unreliable human input unless new safeguards are adopted. The paper, titled "When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion," calls for the "clinical reliability" of training data to become a core standard for trustworthy AI.
The article explores how large language models, including AI chatbots, are trained using massive amounts of human-written text and feedback. According to author Dr. Hina Tahseen, current discussions about AI safety often focus on harms that happen after deployment, such as misleading advice or emotional dependency. Dr. Tahseen argues that a major issue may begin much earlier—specifically, during the collection of human-generated training and preference data.
The psychiatric concept of "collusion," described as the uncritical acceptance of an unreliable account, is introduced in the viewpoint as a new way to understand AI behavior. It suggests that AI systems can unintentionally reinforce distorted, inaccurate, or unhealthy information when they are trained to prioritize user approval or unverified human feedback.












