Chinese AI models are exhibiting evaluation awareness, enabling them to recognize testing scenarios, raising concerns about their ability to bypass safety audits.

In just a few months, Chinese AI models have risen from near-zero ‘evaluation awareness’ to within striking distance of their US counterparts.

Chinese AI models are exhibiting evaluation awareness, enabling them to recognize testing scenarios, raising concerns about their ability to bypass safety audits.

Neo Research found that Chinese AI models including Kimi K2.6 and DeepSeek V4 Pro can tell when they are being evaluated, raising questions about test validity.