Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Benchmark testing models have become essential for enterprises, allowing them to choose the type of performance that resonates with their needs. But not all benchmarks are built the same and many test models are based on static datasets or testing environments.

Researchers from Inclusion AI, which is affiliated with Alibaba’s Ant Group, proposed a new model leaderboard and benchmark that focuses more on a model’s performance in real-life scenarios. They argue that LLMs need a leaderboard that takes into account how people use them and how much people prefer their answers compared to the static knowledge capabilities models have.

Visa’s $3.5B Bet on AI

In a paper, the researchers laid out the foundation for Inclusion Arena, which ranks models based on user preferences.