When a caller code-switches mid-sentence, most voice agents lose what makes them sound native. Cadence slips, the response lands like a translation, and trust drops. Teams patch it by routing between language-specific TTS models, but the handoff adds latency and makes voice behavior inconsistent inside the same conversation. Rime's Arcana V3 line is built for that moment: natural code-switching at production speed without turning multilingual into a routing problem.Starting today, Together AI, the AI Native Cloud, is adding Rime Arcana V3 Turbo and Rime Arcana V3 to the Together Model Library. V3 Turbo delivers English–Spanish code-switching at ~120 ms time-to-first-audio on dedicated endpoints, with prosody trained on bilingual speech patterns. V3 expands switching across 11 languages from a single model. Both run co-located with your LLM and STT workloads behind the same API, authentication, and observability surface you already use.
hi_thanks_for_calling_customer_support_en_de_fr_ja.wav
Hi — thanks for calling customer support. I can help you in multiple languages. (English, German, French, Japanese)
Try now
V3 Turbo: Performance for real-time bilingual conversations~120ms time-to-first-audioVoice agents need end-to-end latency under 700ms to feel conversational, which means TTS must leave headroom for STT and LLM processing. V3 Turbo hits ~120ms time-to-first-audio on Together AI dedicated endpoints, so when a customer switches from English to Spanish mid-sentence, the agent's bilingual response arrives in stride. Co-locating V3 Turbo with LLM and STT on Together AI keeps the full pipeline (speech recognition through reasoning to synthesis) within that 700ms budget.English-Spanish code-switching trained on native bilingual speechBilingual callers mix languages inside a sentence. V3 Turbo is trained on those patterns, including where pauses land and how stress shifts at the boundary. A customer says, "I need help with my account, es que no puedo acceder." V3 Turbo can respond in the same mixed register, with pauses and emphasis that match how bilingual speakers actually talk.Efficient concurrency for high-volume deploymentsV3 Turbo's performance enables higher concurrency per GPU. For contact centers handling thousands of concurrent calls in bilingual markets, this means fewer GPUs to maintain production latency when customers code-switch, reducing total cost of ownership while preserving conversational quality.V3: Multilingual breadth with code-switching~160ms time-to-first-audio across 11 languagesV3 reaches ~160ms p50 time-to-first-audio on Together AI dedicated endpoints while supporting code-switching across 11 languages. This keeps multilingual conversations responsive even as the model handles the complexity of natural transitions between any supported language pair.11 languages with natural transitionsV3 supports 11 languages and can code-switch between supported languages. A customer starts in French, switches to English for a technical term, then back to French for clarification. V3 handles these transitions while preserving prosody and accent consistency.Single model for multilingual marketsV3 lets teams consolidate what used to require separate models or vendors per language. Deploy once and serve multilingual customers from a single endpoint without maintaining separate infrastructure per market. When the conversation switches languages, V3 keeps cadence and emphasis natural so the transition does not sound stitched together.






