Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Back to Articles

Introduction The Benchmark Data Pipeline Evaluation Methodology Findings A. How well do models perform on our benchmark for codeswitching? WER results (lower is better) SWER and AER results (lower is better) B. What additional cost does code-switching add compared to plain monolingual speech? C. How does code-switching break ASR systems? Limitations Conclusion

Introduction

Over half of the world's population speaks more than one language. And for many bilingual speakers, code-switching — seamlessly switching between languages, even mid-sentence — is a natural part of everyday communication. Whether in casual conversations, contact centers, or IT helpdesks, speakers fluidly adapt to whichever language feels most natural in the moment.

Despite the prevalence of bilingual speakers across the world, there has been little work focused on how voice agents handle code-switched speech in enterprise settings. So, when a customer asked us how our voice agents would perform for their largely bilingual customer base who routinely code-switched, we decided to build our own benchmark and dataset to evaluate models. We focused on automatic speech recognition (ASR) — the first step in any voice agent pipeline — because transcription errors propagate forward into every downstream component. In enterprise settings, where a misrouted ticket or misunderstood policy question has real operational consequences, getting the transcript right is an especially important step of the voice agent pipeline.

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Other newsrooms on this story

Related reading

A New Framework for Evaluating Voice Agents (EVA)

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

The 400ms benchmark: Why infrastructure is the real hurdle for SA AI bots to…

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required |…

Building a Real-Time AI Voice Agent That Handles 70+ Languages