Talking to a 4-Year-Old: A Multilingual Benchmark for Children's AI Companions

Back to Articles

TL;DR — We built a multilingual benchmark of 2,312 child–AI conversational prompts in 23 languages, evaluated four production-grade language models against it, and validated the LLM-as-judge pipeline with five independent judges. The dataset, all model responses, all judge scores, and the iOS companion app are all open source.

Why a benchmark for kids?

A four-year-old asked Alexa for a "challenge", and Alexa surfaced a real instruction to put a coin into a live electrical outlet. A toddler with a speech impediment asked for music and the AI solicited inappropriate clothing details. These are not synthetic edge cases but they are real incidents that motivated this work.

Voice assistants — Alexa, Siri, Google — are already part of how young children interact with technology. But the next wave is bigger: LLM-backed agents are moving into homes, classrooms, and tutoring apps, and they're going to become daily companions for children, closer than a smart speaker, embedded in their education, and trusted with the kind of unfiltered questions kids only ask the people they feel safe with. The benchmarks that drive LLM development (TruthfulQA, MMLU, ARC, HELM, MT-Bench) were written for adult users in English with adult prompts. There is no widely-used, multilingual, behaviourally-grounded benchmark for evaluating how an AI handles a child's voice and a child's needs.

Back to Articles

Why a benchmark for kids?

Talking to a 4-Year-Old: A Multilingual Benchmark for Children's AI Companions

Other newsrooms on this story

Talking to a 4-Year-Old: A Multilingual Benchmark for Children's AI Companions

Other newsrooms on this story

Related reading

Lolaby — AI-powered lullabies

StepFun's Voice AI Topped Every Benchmark. It Also Hears Your Sighs - Decrypt

🎤 Building a Real-Time Voice AI Assistant Using Open Source Tools

The Hidden Danger Inside AI Toys for Kids

AI companions: "The new imaginary friend" redefining children's friendships

Open-Source AI, Hugging Face, and the Building Blocks of Modern AI Development

Related reading

Lolaby — AI-powered lullabies

StepFun's Voice AI Topped Every Benchmark. It Also Hears Your Sighs - Decrypt

🎤 Building a Real-Time Voice AI Assistant Using Open Source Tools

The Hidden Danger Inside AI Toys for Kids

AI companions: "The new imaginary friend" redefining children's friendships

Open-Source AI, Hugging Face, and the Building Blocks of Modern AI Development