Does Your LLM Know *When It's About to Be Wrong*?

Back to Articles

Introducing a Metacognition Benchmark, Leaderboard, and Adapters

TL;DR — We measure an LLM's metacognition (its ability to notice and recover from its own mistakes) along two independent axes: ① vulnerability (does it fall for traps?) and ② adapter gain (how well can a tiny frozen-base adapter catch its errors?). We're releasing a 300+100 trap-problem benchmark, a 24-model leaderboard, and 11 per-model adapters — all open. The surprise: even the strongest models barely notice their own mistakes in free-form writing.

The two columns come from two different tests — read each on its own, never compare across a single row.

1. Background & Motivation

Does Your LLM Know When It's About to Be Wrong?

Does Your LLM Know When It's About to Be Wrong?

Other newsrooms on this story

Related reading

Let's talk about LLM evaluation

AI-generated accessibility, an update — frontier models still fail, but skills…

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

A RAG evaluator that admits what it can't judge

Treating the LLM Like an Unreliable Dependency

From Sinus Relief to Toxic Gas: A Lesson in LLM Overreliance (LLM09)