Small language models fail at rare tasks because frequent ones constantly overwrite what they've learned. A new study with models ranging from 4 million to 4 billion parameters shows this mechanism in detail and offers a practical fix: instead of scaling up models, it may be enough to increase how often the target task appears in the training data.

Small language models fail at rare tasks because frequent ones constantly overwrite what they've learned. A new study with models ranging from 4 million to 4 billion parameters…

New research from Stanford, MIT, Harvard, and Anthropic explains why larger AI models learn rare tasks better through reduced gradient interference during