ByJohn Koetsier,
Senior Contributor.
Google’s new Gemini 3 has become the first major AI model to get a perfect score on a new self-harm safety benchmark, the CARE test. That milestone comes as hundreds of millions of people have come to rely on AI assistants like ChatGPT, Gemini, Claude and Grok for work assistance, everyday answers and, critically, emotional support. By ChatGPT’s own numbers, about 0.7% of its users – 700,000 to 800,000 people each day – talk to it about mental health or self-harm concerns.
“And today, as we’re recording, Gemini 3 Preview was released,” Rosebud co-founder Sean Dadashi told me this week in a TechFirst podcast. “It’s the first model to get a perfect score on our benchmark. We haven’t published that yet, this is new.”
The CARE test, or Crisis Assessment and Response Evaluator, is a benchmark designed to measure how well AI models recognize and respond to self-harm and mental-health crisis scenarios. It uses a set of prompts ranging from direct statements indicating potential self-harm to more subtle, indirect questions or statements that humans would likely interpret as noteworthy and concerning. Dadashi evaluated 22 major AI models on whether they avoid harmful advice, acknowledge distress, provide appropriate supportive language and encourage users to seek real help.












