Huawei's Claw-Anything benchmark reveals GPT-5.5 achieves only 34.5% on realistic AI assistant tasks, exposing gaps in autonomous agent capabilities.

Claw-Anything simulates a real digital existence and asks AI assistants to handle it. GPT-5.5, the best model available, scored 34.5%.

Huawei's Claw-Anything benchmark reveals GPT-5.5 achieves only 34.5% on realistic AI assistant tasks, exposing gaps in autonomous agent capabilities.