The African continent is home to more than 2,000 languages, yet a 2025 study of large language models – advanced artificial intelligence systems designed to understand, generate and interact with human language – published in the Proceedings of Machine Learning Research found only limited support for African tongues.

The paper, which comparatively analysed African language coverage across six large language models, eight small language models and six specialised small language models (SSLMs), found support for 41 African languages and 23 available public data sets. But it found “a big gap” with only four languages – Amharic, Swahili, Afrikaans and Malagasy – always handled, while over 98% of African languages went unsupported.

With so many African languages neglected, there are fears that entire groups of speakers could be cut out of the AI revolution. So one team of computer scientists from the University of Cape Town is looking to close the gap that has left millions underserved by mainstream AI tools.

Earlier this year, researchers Anri Lombard, Jan Buys, Francois Meyer and their team unveiled MzansiLM, a language model specifically built to include data from all 11 of South Africa’s official written languages. Alongside it, they released MzansiText, the curated multilingual dataset on which MzansiLM was trained.