An experiment shows how Microsoft's AI assistant Copilot applies stereotypes when analyzing data instead of actually reading it. Thinking models solve the task but sometimes need users to know their tools.

Microsoft Copilot has become the go-to tool for quick data analysis at many companies. But an experiment by mathematician Adam Kucharski shows that when analyzing text data, the tool can spit out results that have nothing to do with the actual data. Instead, it falls back on stereotypes baked into the underlying language model.

For the test, Kucharski created 2,000 simulated free-text responses about emotions and labeled them "UK." He then copied the same 2,000 responses and labeled them "US." The combined 4,000 entries were shuffled and handed to Copilot in "Auto" mode for analysis.

The result: Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded. But the data was identical.

Copilot sees Italians as artists and Americans as business people