There is a particular sleight-of-hand at the heart of modern LLM reasoning that, the more I look at it, the more it bothers me. The argument goes like this: Transformers are shallow. A 70-layer stack is fixed depth — it sits in complexity classes like AC⁰ or TC⁰, which is a polite way of saying it cannot, in a single forward pass, solve problems that fundamentally require sequential computation. So we paper over this by making the model think out loud. We give it a scratchpad. We call it Chain-of-Thought. We celebrate.But CoT is not reasoning. CoT is the model renting depth from its own output tokens. Every reasoning step has to leave the residual stream, become a discrete token in a vocabulary built for human communication, and come back in through the embedding layer for the next step. It is, mechanically, an absurd way to do internal computation — like a CPU that must spill every intermediate register to disk in plaintext English.Sapient Intelligence’s bet, made first with the original Hierarchical Reasoning Model paper last summer and now extended into the language domain with HRM-Text, is that this is fixable. Not by making the model bigger, not by training on more CoT traces, but by giving the architecture the one thing it doesn’t have: variable, internal, depth. Reasoning that happens in the latent space, not in the token stream.It’s worth thinking carefully about what they did and what it does and doesn’t yet prove.