Researchers uncover how popular parameter-efficient finetuning techniques balance learning new tasks against forgetting existing capabilities.

A new evaluation framework is challenging how the AI industry assesses parameter-efficient finetuning (PEFT), the dominant approach for adapting large language models to specialized tasks. Rather than focusing solely on downstream performance, researchers argue the field has overlooked a critical tension: the balance between learning new skills and retaining pretrained knowledge.

According to arXiv research authored by Yangyi Huang, Ruotian Peng, Zeju Qiu, Jiale Kang, Yandong Wen, Bernhard Schölkopf, and Weiyang Liu, this oversight has led to incomplete comparisons between competing PEFT methods. The team introduces PEFT-Arena, a benchmark designed to simultaneously measure how well models perform on new tasks while preserving their general capabilities.

A Classical Problem in Modern Form

The researchers frame their investigation around the stability-plasticity dilemma, a well-studied concept in neuroscience and machine learning. Plasticity refers to a system's ability to adapt to new information, while stability describes its resistance to forgetting what it already knows. PEFT methods occupy different positions along this spectrum, yet existing benchmarks typically reward only plasticity.