Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Back to Articles

"When a measure becomes a target, it ceases to be a good measure." (Goodhart’s Law)

TLDR: Appen Inc. and DataoceanAI have provided high-quality English ASR datasets covering scripted and conversational speech over multiple accents. To prevent potential risks of benchmaxxing or test-set contamination, we will keep these datasets private for a high-quality measure of performance on multiple tasks.

We’re not updating the average WER at this time: by default, the leaderboard’s Average WER remains computed on public datasets only. You can optionally include the private datasets using the toggle to see their impact 👀

Since its launch in September 2023, the Open ASR Leaderboard has been visited over 710K times. We’re blown away by the community’s interest and motivation to keep pushing speech recognition 🗣️

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Related reading

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

Introducing Real World VoiceEQ: Measuring the human quality of voice AI

AstaBench update: New results, plus adoption from industry | Ai2

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in…

Kaggle is making AI benchmark creation effortless

How to build a better AI benchmark