Small language models: Rethinking enterprise AI architecture

Large language models (LLMs) are the workhorses of AI, supporting ever more sophisticated capabilities and workflows, and approaching near-human level performance.

But sometimes more isn’t always better — it’s just more. Specialized data and limited capabilities are just fine for some workflows.

This realization is driving the evolution of small language models (SLMs), rather than one-size-fits-all LLMs. SLMs — coming in the form of domain-specific models, statistical language models, and neural language models — are faster, cheaper, less resource-intensive, and more private than traditional LLMs, according to experts.

It’s not simply a replacement story, though. “The pattern is closer to a better division of labor,” says Thomas Randall, a research director at Info-Tech Research Group. “A routing architecture sends simple or well-scoped queries to a specialized small model, and complex queries to a large model.”

While LLMs can feature parameter counts in the hundreds of billions — or, increasingly, trillions — SLMs typically fall in the 1 billion to 7 billion parameter range. Generally, anything below 10 billion is considered small.

Small language models: Rethinking enterprise AI architecture

Other newsrooms on this story

Related reading

How test-time scaling unlocks hidden reasoning abilities in small language…

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000…

Can a Chip That Loves Zeros Make Huge AI Models More Efficient?

Teaching the model: Designing LLM feedback loops that get smarter over time

Dall'LLM al LAM: architetture di orchestrazione per sistemi Agentic multi-modali

This AI Model Never Stops Learning