Your LLM can't read. Here's the weird trick it uses instead

Here's a fact that breaks people's mental model of large language models the first time they really...

sabato 13 giugno 2026 New tab

TL;DRAI

LLMs convert text to numeric token IDs via Byte Pair Encoding—rare words split into fragments, common ones stay whole. Token billing hides costs: numbers and non-English inflate counts, and tokenization boundaries explain unexpected model behavior.

781 words~4 min read

Here's a fact that breaks people's mental model of large language models the first time they really sit with it:

A language model never sees your words. Not one. It sees numbers — and only numbers.

When you type Hello, world into ChatGPT, the model on the other end isn't reading English. By the time your text reaches the neural network, it's been chopped into chunks called tokens and each chunk has been swapped for an integer ID. The model is, underneath all the magic, a very expensive function that maps integers to integers. The "intelligence" is what happens in between.

Let's actually look at it.

See it for yourself (5 lines of Python)

Your LLM can't read. Here's the weird trick it uses instead

Your LLM can't read. Here's the weird trick it uses instead

Other newsrooms on this story

Related reading

How much information do LLMs really memorize? Now we know, thanks to Meta,…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Proving (literally) that ChatGPT isn't conscious

RAG (Retrieval-Augmented Generation) Explained for Beginners: Build AI…

What do LLMs think when you don't tell them what to think about?

93. GPT: The Model That Predicts the Next Word Forever

Other newsrooms on this story

Related reading

How much information do LLMs really memorize? Now we know, thanks to Meta,…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Proving (literally) that ChatGPT isn't conscious

RAG (Retrieval-Augmented Generation) Explained for Beginners: Build AI…

What do LLMs think when you don't tell them what to think about?

93. GPT: The Model That Predicts the Next Word Forever