A Shanghai-based AI lab just quietly embarrassed some of the biggest names in tech. StepFun’s StepAudio 2.5 Realtime, released around May 24, swept all five major voice AI benchmarks from April 2026 testing, beating out both GPT Realtime 1.5 and Gemini Live in the process.
The model doesn’t just understand what you say. It understands how you say it, interpreting tone, emotion, and speech rate in ways that make most competing voice assistants sound like they’re reading a script in a monotone.
The numbers behind the noise
StepAudio 2.5 Realtime posted top scores across every benchmark category tested. In human evaluation, it scored 80.41. General dialogue performance hit 86.36. Automotive scenario testing, which measures how well the model handles voice interaction in driving contexts, landed at 84.80.
The spoken question-and-answer benchmark, spanning 11 separate tasks, came in at 79.80. And the paralinguistic comprehension score, arguably the most interesting metric here, reached 82.18.











