A Chinese firm has just launched a constantly changing set of AI benchmarks

Venture capital company Hongshan Capital Global originally developed it to assess its potential investments. Now it’s opening it up for anyone to use.

When testing an AI model, it’s hard to tell if it is reasoning or just regurgitating answers from its training data. Xbench, a new benchmark developed by the Chinese venture capital firm HSG, or Hongshan Capital Group, might help to sidestep that issue. That’s thanks to the way it evaluates models not only on the ability to pass arbitrary tests, like most other benchmarks, but also on the ability to execute real-world tasks, which is more unusual. It will be updated on a regular basis to try to keep it evergreen.

This week the company is making part of its question set open-source and letting anyone use for free. The team has also released a leaderboard comparing how mainstream AI models stack up when tested on Xbench. (ChatGPT o3 ranked first across all categories, though ByteDance’s Doubao, Gemini 2.5 Pro, and Grok all still did pretty well, as did Claude Sonnet.)

Development of the benchmark at Hongshan began in 2022, following ChatGPT’s breakout success, as an internal tool for assessing which models are worth investing in. Since then, led by partner Gong Yuan, the team has steadily expanded the system, bringing in outside researchers and professionals to help refine it. As the project grew more sophisticated, they decided to release it to the public.

A Chinese firm has just launched a constantly changing set of AI benchmarks

Other newsrooms on this story

Related reading

How to build a better AI benchmark

Exclusive | DeepSeek evaluates AI models for ‘frontier risks’, source says

Former Intel CEO launches a benchmark to measure AI alignment | TechCrunch

DeepSeek secrets unveiled: engineers reveal science behind Chinese AI model

One year after DeepSeek, Chinese AI firms from Alibaba to Moonshot race to…

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in…