Back to Articles
Introducing a Metacognition Benchmark, Leaderboard, and Adapters
TL;DR — We measure an LLM's metacognition (its ability to notice and recover from its own mistakes) along two independent axes: ① vulnerability (does it fall for traps?) and ② adapter gain (how well can a tiny frozen-base adapter catch its errors?). We're releasing a 300+100 trap-problem benchmark, a 24-model leaderboard, and 11 per-model adapters — all open. The surprise: even the strongest models barely notice their own mistakes in free-form writing.
The two columns come from two different tests — read each on its own, never compare across a single row.
1. Background & Motivation












