WHY ARE WE ALL YELLING?!
Anthropic
Unfortunately for AI self-awareness boosters, this demonstrated ability was extremely inconsistent and brittle across repeated tests. The best-performing models in Anthropic’s tests—Opus 4 and 4.1—topped out at correctly identifying the injected concept just 20 percent of the time.
In a similar test where the model was asked “Are you experiencing anything unusual?” Opus 4.1 improved to a 42 percent success rate that nonetheless still fell below even a bare majority of trials. The size of the “introspection” effect was also highly sensitive to which internal model layer the insertion was performed on—if the concept was introduced too early or too late in the multi-step inference process, the “self-awareness” effect disappeared completely.
Show us the mechanism







