For years, the AI industry has operated under a simple gospel: bigger models equal better results. A team from Shanghai AI Laboratory just published a paper that politely disagrees.

Their model, Agents-A1, packs 35 billion parameters into a Mixture-of-Experts architecture. It matches, and in several benchmarks outperforms, models roughly 30 times its size. The trick wasn’t scaling up. It was scaling out, training the model on longer, more complex task sequences rather than inflating the parameter count.

How a 35B model punches above its weight

The model goes through a three-stage training protocol. First comes full-domain supervised fine-tuning, where the model learns across a broad set of tasks. Then it trains with domain-level teacher models, essentially learning from specialized experts. Finally, a multi-teacher on-policy distillation stage lets the model absorb knowledge from multiple teachers simultaneously while generating its own outputs.

There’s also a domain-grounded knowledge-action framework baked into the architecture. This gives the model a structured way to make decisions based on actions, observations, and verified outcomes.