Agents-A1 achieves 1T-model performance through long-task training, not bigger parameters

For years, the AI industry has operated under a simple gospel: bigger models equal better results. A team from Shanghai AI Laboratory just published a paper that politely disagrees.

Their model, Agents-A1, packs 35 billion parameters into a Mixture-of-Experts architecture. It matches, and in several benchmarks outperforms, models roughly 30 times its size. The trick wasn’t scaling up. It was scaling out, training the model on longer, more complex task sequences rather than inflating the parameter count.

How a 35B model punches above its weight

The model goes through a three-stage training protocol. First comes full-domain supervised fine-tuning, where the model learns across a broad set of tasks. Then it trains with domain-level teacher models, essentially learning from specialized experts. Finally, a multi-teacher on-policy distillation stage lets the model absorb knowledge from multiple teachers simultaneously while generating its own outputs.

There’s also a domain-grounded knowledge-action framework baked into the architecture. This gives the model a structured way to make decisions based on actions, observations, and verified outcomes.

For years, the AI industry has operated under a simple gospel: bigger models equal better results. A team from Shanghai AI Laboratory just published a paper that politely disagrees.

How a 35B model punches above its weight

There’s also a domain-grounded knowledge-action framework baked into the architecture. This gives the model a structured way to make decisions based on actions, observations, and verified outcomes.

Agents-A1 achieves 1T-model performance through long-task training, not bigger parameters

Agents-A1 achieves 1T-model performance through long-task training, not bigger parameters

Other newsrooms on this story

Related reading

A 35-billion-parameter agent that punches like a trillion-parameter model

Stop Measuring AI By Parameter Count. Here’s What Actually Matters

The Same AI Model Can Perform 6x Better: Here's Why

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Small Language Models on Edge Devices: How 2.6B Parameters Are Outperforming…

Can a Chip That Loves Zeros Make Huge AI Models More Efficient?

Other newsrooms on this story

Related reading

A 35-billion-parameter agent that punches like a trillion-parameter model

Stop Measuring AI By Parameter Count. Here’s What Actually Matters

The Same AI Model Can Perform 6x Better: Here's Why

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Small Language Models on Edge Devices: How 2.6B Parameters Are Outperforming…

Can a Chip That Loves Zeros Make Huge AI Models More Efficient?