Making AI chatbots helpful weakens their ability to simulate human behavior, large-scale study finds

A large-scale study covering 208,000 participants and 26 million responses shows that the very training that turns language models into helpful chatbots weakens their ability to replicate human behavior. The effect gets worse with each new model generation. Even the popular persona trick, feeding models demographic profiles, brings practically no benefit for individual predictions.

sabato 30 maggio 2026 New tab

A large-scale study shows that the training process turning raw language models into helpful chatbots also weakens their ability to mimic human behavior. The effect gets worse with each new generation.

Language models are increasingly used as stand-ins for human test subjects to predict reactions to policy measures, simulate clinical training for psychiatrists, or model how students learn.

A new study from an international research consortium, including scientists from Helmholtz Munich, arrives at an inconvenient finding: the very training steps that turn language models into useful assistants make them worse at modeling human behavior.

The study builds on Psych-201, a new dataset of transcripts from behavioral experiments. It covers about 208,000 participants and roughly 26 million individual responses from hundreds of experiments, several times larger than any previous collection of its kind.

Each data point captures a participant's full run through an experiment, along with detailed metadata like age, nationality, questionnaire responses, and other traits. The dataset was assembled through an open research collaboration involving researchers from more than 35 institutions.

Language models are increasingly used as stand-ins for human test subjects to predict reactions to policy measures, simulate clinical training for psychiatrists, or model how students learn.

Making AI chatbots helpful weakens their ability to simulate human behavior, large-scale study finds

Making AI chatbots helpful weakens their ability to simulate human behavior, large-scale study finds

Other newsrooms on this story

Related reading

Research reveals AI memory tools can degrade model performance and fuel…

Other newsrooms on this story

Related reading

Research reveals AI memory tools can degrade model performance and fuel…

Can we trust AI models? Yale researchers explore the roots of chatbot errors

Misbehaving chatbots could be kept in check with personality tests

Making AI chatbots more friendly leads to mistakes and support of conspiracy…

People training new AI models admit they just get chatbots to do it

ChatGPT, Claude and other AI chatbots may help lonely people feel better, but…