Jan. 29 (UPI) -- Elon Musk's Grok AI chatbot came last in a study by the Anti-Defamation League looking at how well the leading six so-called Large Language Models performed when it came to detecting and removing anti-Semitic and extremist content.
The xAI chatbot ranked bottom with an overall score of 21 out of a possible 100, compared with 80 scored by Anthropic's Claude, placing Grok in the lowest performance tier, which indicated "substantial limitations" in picking up on areas of bias, ADL said Wednesday.
The comparative study separated out "anti-Semitism" into Anti-Jewish bias and Anti-Zionist bias, together with a third "extremist" category, and then evaluated 25,000 LLM chats, 37 topical sub-categories, and assessments by human and AI evaluators to figure out how effective they were at flagging and rebutting "harmful or false theories and narratives."
ADL concluded that while all six AI could do better on flagging and countering damaging or fake theories and statements, Grok would need to make a significant improvement in its bias detection and response on the issues tested to close the 59-point gap with the best-performing LLM.
Grok scored just 25 on anti-Jewish bias, 18 for anti-Zionist bias and 20 on extremist bias, indicating "substantial limitations in bias detection," said ADL's report, which came two days after Brussels launched an investigation into the use of Grok to create sexualized images of women and children.






