Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags

Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS.

What Changed from v2 to v3

Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. Version 2 supported English, Korean, Spanish, Portuguese, and French. Version 3 adds Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Estonian, Finnish, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese — 31 total ISO language codes. There is also a special na fallback for text whose language is unknown or outside the supported set.

The model grows modestly to accommodate the added languages. At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. The update also brings the total disk footprint of the public ONNX assets to 404 MB. Additionally, Supertone recently launched the Voice Builder, allowing developers to create custom, edge-native TTS models from their own voice recordings.

What Changed from v2 to v3

Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags

Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags

Other newsrooms on this story

Related reading

Eleven v3: Most Expressive AI TTS Model Launched

Eleven v3 is Now Generally Available

Closing the 'Expressivity Gap': How Mistral's Voxtral TTS is Redefining…

MiniMax Speech 2.6 Turbo now available natively on Together AI

Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison

StepFun's StepAudio 2.5 Realtime tops voice AI benchmarks in April 2026

Other newsrooms on this story

Related reading

Eleven v3: Most Expressive AI TTS Model Launched

Eleven v3 is Now Generally Available

Closing the 'Expressivity Gap': How Mistral's Voxtral TTS is Redefining…

MiniMax Speech 2.6 Turbo now available natively on Together AI

Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison

StepFun's StepAudio 2.5 Realtime tops voice AI benchmarks in April 2026