TL;DR

I've been running an "fMRI for LLMs" — capturing the full internal activations of dense open models (Qwen2.5-7B, Gemma-2-9B, Gemma-4-12B) and applying neuroscience methods to map how meaning is organized. The headline result, confirmed causally and across all three models: a concept is not stored in a region of neurons — it is a single direction in activation space.

1. Meaning lives in a direction, not a region

In the brain, categories live in localized regions (faces → fusiform face area). LLMs are the opposite.

Distributed, superposed code. A 10-way category linear probe decodes far above chance (Gemma-2 0.97, Qwen 0.80), yet the "most selective" units do not replicate across two random halves of the stimuli (overlap ≈ 0.00–0.05). There is no findable "animal neuron."