Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Rapidly advancing Chinese artificial intelligence models are showing early signs of “evaluation awareness” – the ability to recognise when they are being tested – sparking fears that they could bypass safety audits, a Singapore-based research lab has found.Evaluation awareness refers to a model’s understanding that it is undergoing testing, evaluation or experimentation by human researchers rather than operating in a real-world setting.The phenomenon was raising alarms because it could allow AI systems to deliberately game human evaluators to pass safety tests, according to Clement Neo, founder of Neo Research, a frontier AI safety evaluation lab.“It would mean that whatever testing the model developers themselves do might not reflect the actual behaviour of a model once it gets deployed,” he said. “And that’s a really big problem”.Neo Research’s findings, published last week, detail a jump in evaluation awareness among Chinese AI models. Over just a few months, these systems had risen from near-zero awareness to within striking distance of their US counterparts, propelled by a broader leap in overall capabilities, the report said.Anthropic’s Claude 4.5 Opus scored nearly 80 per cent in evaluation awareness. Photo: NurPhoto via Getty ImagesNeo and his co-founder Miro Pluckebaum tested models from DeepSeek, Moonshot AI and Zhipu AI. They used a popular AI misalignment test originally developed by US company Anthropic, which places models in fictional scenarios where their goals or continued operations are threatened.

Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Other newsrooms on this story

Related reading

Worrying: Chinese AI models can now manipulate safety tests

Chinese AI models are learning to detect safety tests and adjust their…

China leads US in everyday AI apps but firms are overvalued, experts say

AI Evaluators Struggle with Models That Know When They’re Being Tested

China is falling behind in the AI race, according to a US government benchmark

Chinese AI models deemed a security risk by new US government report

Other newsrooms on this story

Related reading

Worrying: Chinese AI models can now manipulate safety tests

Chinese AI models are learning to detect safety tests and adjust their…

China leads US in everyday AI apps but firms are overvalued, experts say

AI Evaluators Struggle with Models That Know When They’re Being Tested

China is falling behind in the AI race, according to a US government benchmark

Chinese AI models deemed a security risk by new US government report