How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Most people interested in generative AI likely already know that Large Language Models (LLMs) — like those behind ChatGPT, Anthropic’s Claude, and Google’s Gemini — are trained on massive datasets: trillions of words pulled from websites, books, codebases, and, increasingly, other media such as images, audio, and video. But why?

From this data, LLMs develop a statistical, generalized understanding of language, its patterns, and the world — encoded in the form of billions of parameters, or “settings,” in a network of artificial neurons (which are mathematical functions that transform input data into output signals).

By being exposed to all this training data, LLMs learn to detect and generalize patterns that are reflected in the parameters of their neurons. For instance, the word “apple” often appears near terms related to food, fruit, or trees, and sometimes computers. The model picks up that apples can be red, green, or yellow, or even sometimes other colors if rotten or rare, are spelled “a-p-p-l-e” in English, and are edible. This statistical knowledge influences how the model responds when a user enters a prompt — shaping the output it generates based on the associations it “learned” from the training data.

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell

How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell

Related reading

RAG (Retrieval-Augmented Generation) Explained for Beginners: Build AI…

Your LLM can't read. Here's the weird trick it uses instead

✨📊 🧠 The Ultimate Visual Guide to Large Language Models (LLMs)

MIT's MeMo framework boosts LLM performance by 26% without retraining

An AI model that thinks like we do offers new ways to peer inside the black box

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Related reading

RAG (Retrieval-Augmented Generation) Explained for Beginners: Build AI…

Your LLM can't read. Here's the weird trick it uses instead

✨📊 🧠 The Ultimate Visual Guide to Large Language Models (LLMs)

MIT's MeMo framework boosts LLM performance by 26% without retraining

An AI model that thinks like we do offers new ways to peer inside the black box

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…