Large language models (LLMs) are the workhorses of AI, supporting ever more sophisticated capabilities and workflows, and approaching near-human level performance.
But sometimes more isn’t always better — it’s just more. Specialized data and limited capabilities are just fine for some workflows.
This realization is driving the evolution of small language models (SLMs), rather than one-size-fits-all LLMs. SLMs — coming in the form of domain-specific models, statistical language models, and neural language models — are faster, cheaper, less resource-intensive, and more private than traditional LLMs, according to experts.
It’s not simply a replacement story, though. “The pattern is closer to a better division of labor,” says Thomas Randall, a research director at Info-Tech Research Group. “A routing architecture sends simple or well-scoped queries to a specialized small model, and complex queries to a large model.”
While LLMs can feature parameter counts in the hundreds of billions — or, increasingly, trillions — SLMs typically fall in the 1 billion to 7 billion parameter range. Generally, anything below 10 billion is considered small.






