Africa, the Middle East and South Asia represent roughly half the world’s population, and they are also home to hundreds of distinct musical traditions. But in the training datasets most commonly used to build music AI models, music from Africa accounts for only 0.3%, the Middle East: 0.4%, and South Asia 0.9% — whereas Western genres make up 94%.
These numbers come from researchers at Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence, who surveyed the training datasets behind today’s generative music tools and presented the findings at the 2025 Nations of the Americas Chapter of the Association for Computational Linguistics (NAACAL).
When those models tried generating music in the tradition of an Indian raga, they defaulted to a sitar playing Western tonal structures, producing something that sounded Western with an Indian instrument on top. The same study tested Turkish Makam, a melodic system built on intervals that don’t exist on a Western piano. Once again, the models flattened those intervals into standard Western pitch. When the researchers fed the model additional Hindustani Classical and Turkish Makam recordings to correct the bias, its creative output actually got worse. The Western training data was too dominant to override.








