This article is part of our coverage of the latest in AI research.

Self-distillation has emerged as an effective post-training paradigm for large language models, often improving performance while shortening reasoning traces. However, recent research by Microsoft Research, KAIST, and Seoul National University reveals a major flaw in this approach.

In mathematical reasoning, self-distillation inadvertently suppresses behaviors that allow models to explore alternative hypotheses and self-correct during complex problem-solving. As a result, the models become significantly less accurate on out-of-distribution problems.

The key takeaway is that optimizing post-training solely to reinforce concise, correct reasoning traces can quietly destroy a model’s ability to generalize. Across various open-weight models, researchers found that self-distillation can cause performance drops of up to 40% on unseen tasks.

For models to maintain their robust reasoning abilities, they must be exposed to different levels of uncertainty during training.