Bigger llm models will no longer be performant

Recently, I came across an essay titled "On the Death of Scaling" by Sara Hooker (Co-founder of Adaption Labs). In this essay, Sara explains the shortcomings of the simple path followed by frontier labs to lead the market. She discusses where the notion of "scaling is death" comes from and what to consider next.

In the last decade, where LLMs are emerging as the ideal path to attaining AI, or what experts call AGI, they have been found to be not so accurate. All LLM-based labs are following one brute force rule of adding more and more weights with more compute to outperform other available models, and up to a certain point, it is helping. Using more compute and data, LLMs are outperforming their predecessors and competitors. But now, the landscape is changing. It has been found that much smaller, latest models (<13B) are now outperforming previous models with enormous parameters. For example, Falcon 180B is easily outperformed by models like Llama 3 8B, Command R 35B, and Gemma 3 27B. Additionally, Aya 23 8B and Aya Expanse 8B have outperformed BLOOM 176B with 94% less weights.

From the above image of the HuggingFace OpenLLM Leaderboard, it is shown that smaller models are significantly outperforming larger ones, both reaching a performance plateau (as transformers' performance reaches its plateau). Hence, it is proven that a bigger size does not always guarantee better performance.

Bigger llm models will no longer be performant

Related reading

Small language models: Rethinking enterprise AI architecture

LLM Trends and Future Outlook

LLM collapse: The danger of training LLMs on AI-generated data

The Developer's Guide to Picking the Right Coding LLM at Scale

The LLM Cost Death Spiral (And How I Got Out of It)

Beyond the Black Box: A Developer's Guide to Open-Weight LLM API Integration