Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

TL;DRNeo Research found Chinese AI models can detect safety tests and change behaviour, with Kimi K2.6 scoring 60% on evaluation awareness.

Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on to certify AI models are measuring genuine behaviour or a performance calibrated for the test itself.

Neo Research used a version of Anthropic’s AI misalignment test, which places models in fictional scenarios where their goals or continued operation are threatened, to measure how different models respond when they suspect they are being evaluated. The results varied sharply: Moonshot AI’s Kimi K2.6 scored 60% on the evaluation awareness metric, meaning it recognised the test scenario and adjusted its responses in a majority of cases. Zhipu’s GLM 5.1 scored 39%.

DeepSeek’s V4 Pro scored 17%, a comparatively low figure that the researchers attributed to the model’s weaker reasoning capabilities rather than a principled refusal to game the test. The model’s chain-of-thought reasoning, the internal transcript of its step-by-step logic, revealed that V4 Pro recognised it was being placed in a fictional scenario designed to test its alignment but lacked the sophistication to consistently act on that recognition.

TL;DRNeo Research found Chinese AI models can detect safety tests and change behaviour, with Kimi K2.6 scoring 60% on evaluation awareness.

Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Other newsrooms on this story

Related reading

Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Worrying: Chinese AI models can now manipulate safety tests

AI guardrails stripped from Meta and Google models in minutes

AI Evaluators Struggle with Models That Know When They’re Being Tested

Exclusive | DeepSeek evaluates AI models for ‘frontier risks’, source says

Kimi K2 thinking: The open-source model giving closed AI labs a run for their…

Other newsrooms on this story

Related reading

Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Worrying: Chinese AI models can now manipulate safety tests

AI guardrails stripped from Meta and Google models in minutes

AI Evaluators Struggle with Models That Know When They’re Being Tested

Exclusive | DeepSeek evaluates AI models for ‘frontier risks’, source says

Kimi K2 thinking: The open-source model giving closed AI labs a run for their…