Video has become one of the most popular mediums for information sharing. Yet, the language distribution of popular video contents on the internet does not necessarily reflect the diversity of global audiences. For example, a prior study found that 66% of videos from the top 250 YouTube channels are in English, while Spanish, the second most common language, accounts for only 15% [1,2], leaving much of this content inaccessible to viewers around the world. This gap highlights the need for scalable video translation solutions. Can cutting-edge AI help break down language barriers, making video content more accessible to global audiences?Today, we are excited to introduce Violin — a fully open-source video translation tool, powered by Together API. The violin pipeline uses state-of-the-art speech recognition, large language models, and speech synthesis to achieve high-quality video translation. Beyond standard translation, we develop interactive and personalized features, such as a video-content–aware chat assistant and natural language voice picker. We hope Violin can empower users across languages to access information more easily and can help high-quality video content travel further across the web.Violin: Breaking the language barriers of video sharingTo illustrate Violin’s capabilities, we took a recent technical talk from Together AI and translated it into a different language.
Violin: An open-source video translation skill that breaks language barriers
Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.















