Researchers pinpoint why larger language models pick up skills that small ones miss

A new study suggests that instead of endlessly inflating models, it may be more efficient to increase the frequency of specific tasks in training data to anchor rare skills in smaller models.

A new study from researchers at Anthropic, Stanford, and other institutions explains why larger language models learn certain tasks that smaller ones fail at. The finding goes beyond the conventional wisdom that big models simply learn faster.

In some cases, small models can't reliably learn rare tasks even with extremely long training runs. Even well-known scaling laws show that a small model never reaches the loss of a large one, no matter how much data you throw at it.

Only the larger OLMo models learn the rarely interspersed tasks reliably, as can be seen from the orange-colored fields at the bottom right of both tasks. | Image: Huang et al.

Common tasks crowd out rare ones

Researchers pinpoint why larger language models pick up skills that small ones miss

Other newsrooms on this story

Related reading

Stanford, MIT, Harvard, Anthropic study reveals why larger models learn rare…

How test-time scaling unlocks hidden reasoning abilities in small language…

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Making AI chatbots helpful weakens their ability to simulate human behavior,…

Large Language Models Are Overkill For Some Marketing Tasks. Enter The Small…

Large language models can predict the results of social science experiments -…