When psychologist Raluca Rilla asked volunteers to complete a survey last year, she got the following response to one of her questions: “I don’t experience confusion in the same way humans do.”Rilla, a PhD student at the Max Planck Institute for Human Development in Berlin, suspects that this is the obvious tip of a large and worrying iceberg — one that could scupper academic research on how people think and behave. She and her colleagues estimate that up to 45% of responses they receive to such surveys are now copied and pasted from the output of large language models (LLMs)1. In some cases, participants might simply be polishing their language. In others, Rilla thinks that the entire operation — signing up, reading the questions and submitting responses — is handled by a machine. Such answers, and the academic studies built on them, are unlikely to reflect the reality of human nature.Experimental psychology is not alone in wrestling with the impact of LLMs on research. From political science and economics to opinion polling, researchers across the social sciences are sounding the alarm after finding the fingerprints of artificial intelligence and considering the implications.AI chatbots are infiltrating social-science surveys — and getting better at avoiding detectionEven if AI input into polls can be throttled, there’s a concern at the analysis stage, says David Lazer, a political and computer scientist at Northeastern University in Boston, Massachusetts: AI-assisted analyses in social science might flood journals with spurious findings by rapidly whipping up studies. One journal has already chronicled a vast increase in the number of manuscripts it has received that were wholly or mostly prepared using AI tools2.The explosion in the use and power of AI models touches researchers across all academic fields. But the impact on the social sciences is especially acute, says Joshua Tucker, a political scientist at New York University. That’s because, compared with other disciplines, much social-science research is heavily reliant on survey data and analysis. And when researchers aren’t gathering the data themselves, they are often analysing large, general data sets, such as censuses or other huge surveys that were usually collected for a different original purpose. This means that apparent signals in the data can be plucked from noise in a way that isn’t possible with experimental data obtained in narrow tests to check a hypothesis — information that tends to have a single use and a defined shelf life.“I think we’re approaching a time where the trust in behavioural and social sciences will be undermined by this constant threat of LLM pollution,” says Björn Hommel, a psychologist at Leipzig University, Germany. “And there’s nothing that we are able to do about it right now.”But it’s not all doom and gloom. An alternative view of the latest AI systems is that they could transform social science by making its findings more robust. The same algorithms that can be used for superficial work such as polishing language can also source and analyse complex data sets quickly and, by toggling through statistical techniques, check how sensitive an individual finding is to various analytical methods. AI-assisted review could help to spot methodological errors, and social-science journals might insist on the use of more-robust methods as AI makes it easier for researchers to attempt them.“We shouldn’t gloss over the benefits of AI, and it is opening up the possibility to do so much interesting research,” says Tucker.Paradox of productivityThe most immediate problem is paradoxical: the technology can vastly boost productivity.In April, the journal Organization Science, which publishes social-science studies of organizations, reported a 42% increase in the number of manuscripts submitted to the journal since November 2022, when ChatGPT was first publicly released. Editors analysed the manuscripts with an LLM-detection tool from the firm Pangram Labs in New York City, and found that the rise was mostly driven by AI2. By this February, nearly one-third of submissions contained text in the abstract that was mostly or wholly AI-generated; another 40% contained text that was partly AI-written (see ‘The rise of AI use in a social-sciences journal’).Source: Ref. 1Political scientist and journal editor Kevin Munger at the European University Institute in Florence, Italy, has predicted 50% increases in submissions to leading political-science journals this year (see go.nature.com/4achvqc). And the preprint server for psychology research, PsyArXiv, got such a flood of papers that it had to include checks by humans earlier in its screening processes, says Jamie Cummins, a meta-scientist at the University of Bern, who works as a moderator for the site.Social science is not alone in struggling with this issue. But Tucker and Lazer worry that, because much of the field relies on survey analysis, it is unusually susceptible to the rapid AI-based production of fragile research.In an interview with Nature, Lazer demonstrated how to use LLMs to rapidly whip up a convincing, if thin, research paper. The paper was based on analysis of data collected by the Civic Health and Institutions Project’s 50 States Survey (CHIP50), a US initiative that measures public trust and institutional legitimacy. “We’ve surveyed roughly a million people on many different topics over the past six years,” Lazer says.Hallucinated citations highest in social sciences preprints siteSome months ago, he and his team asked respondents about their use of GLP-1 agonists — drugs developed to treat diabetes that are now known to help with weight loss. A quick look at the results suggested that the biggest users of the medications are not necessarily those who have clinical needs such as diabetes or obesity. Lazer showed it is possible to get an LLM to write up that observation in an hour as a 28-page academic paper, with a literature review, tabulated results drawn directly from the CHIP50 data set, and convincing graphs and figures.It might well be a legitimate finding — but to Lazer, that is not the point. “What am I doing? Am I outsourcing some of my cerebellum, some of my essential creative capacity to the AI? And the answer is sort of yes, and it is honestly just emotionally distressing,” he says.Lazer hasn’t submitted manuscripts written this way to journals, he adds. “I’m trying to evaluate what one can do and then wrestling with the question of what one should do.”Survey pollutionFor Rilla and others, the growing pollution of survey data by LLMs is a thornier problem, and one that more specifically affects the social sciences. When surveys are distributed on crowdsourcing platforms such as Amazon Mechanical Turk and Prolific, which pay volunteers small amounts of money for their answers, there’s an incentive to cheat the system. Many social scientists hope that there are ways online surveys can be rescued.Like some other scientists3, Rilla has introduced a series of checks into her research, known as honeypots, that can detect the use of LLMs and enable her team to reject survey answers. The honeypots range from lines of vanishingly small text in the source code of survey questions that would pass into copy-and-pasted responses to hidden instructions for AIs to simply respond with a string of Xs.How much of the scientific literature is generated by AI?It’s an arms race, she says: as the LLMs get more sophisticated and more able to conceal their tracks, researchers will have to find ways to beat them. For the most important studies that rely on human responses, scientists might have to return to corralling cohorts of volunteers and physically supervising them while they complete surveys. (The biases and lack of diversity in such samples are why web-based surveys of the general population were developed in the first place.)One response to the difficulty of finding human participants has been to introduce ‘silicon samples’. The term was coined in a 2022 study4 by US researchers that showed how an LLM trained using the real socio-demographics of a population — including age, race, gender and political affiliation — is able to generate ‘virtual populations’ of survey respondents.“You basically ask it to assume certain properties. ‘Please give me data as if I had done a survey on 1,000 Swiss people,’” says Malte Elson, a psychologist also at the University of Bern.In theory, these samples might allow hard-to-reach populations to be modelled cheaply and quickly and then ‘asked’ for their opinions.Some survey companies now offer synthetic participants as a commercial service, and they’re used in marketing research. Elson and others are worried that the approach could be widely adopted in social science.Cummins’s research on silicon samples5 has shown that, depending on how you configure the model — adjusting parameters such as ‘temperature’, which controls how variable the output is — you can obtain almost any result you want. He emphasizes that researchers wouldn’t necessarily be looking to manipulate outcomes, but that the effect would be to produce a wide variety of answers.Elson takes a harsher view. “You basically get to dictate that it should give you results that either support or reject your hypothesis,” he says. “Right now, it’s indistinguishable from fraud.”