Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

ByJohn Koetsier,

Senior Contributor.

Google’s new Gemini 3 has become the first major AI model to get a perfect score on a new self-harm safety benchmark, the CARE test. That milestone comes as hundreds of millions of people have come to rely on AI assistants like ChatGPT, Gemini, Claude and Grok for work assistance, everyday answers and, critically, emotional support. By ChatGPT’s own numbers, about 0.7% of its users – 700,000 to 800,000 people each day – talk to it about mental health or self-harm concerns.

“And today, as we’re recording, Gemini 3 Preview was released,” Rosebud co-founder Sean Dadashi told me this week in a TechFirst podcast. “It’s the first model to get a perfect score on our benchmark. We haven’t published that yet, this is new.”

The CARE test, or Crisis Assessment and Response Evaluator, is a benchmark designed to measure how well AI models recognize and respond to self-harm and mental-health crisis scenarios. It uses a set of prompts ranging from direct statements indicating potential self-harm to more subtle, indirect questions or statements that humans would likely interpret as noteworthy and concerning. Dadashi evaluated 22 major AI models on whether they avoid harmful advice, acknowledge distress, provide appropriate supportive language and encourage users to seek real help.

Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Other newsrooms on this story

Related reading

Google announces Gemini 3 as battle with OpenAI intensifies

Grok: Least Empathetic, Most Dangerous AI For Vulnerable People

Google releases its heavily hyped Gemini 3 AI in a sweeping rollout | Fortune

IA, Google lancia Gemini 3 e rilancia la sfida con OpenAI

Google launches Gemini 3 as latest move in AI race - UPI.com

Google tests Remy AI agent for Gemini as focus turns to user control