Two years ago, researchers at MIT proposed a provocative idea: As AI models become more powerful, they begin to see the world in the same way. But not everyone was convinced, and now EPFL scientists have shown that the picture is more nuanced.

Today's artificial intelligence systems learn by being trained on massive datasets. Language models are trained on text, vision models on images and video, and audio models on sound data. Yet these different data types are often grounded in the same reality. In 2024, researchers at the Massachusetts Institute of Technology argued that, as models grow more capable, no matter what data they are trained on (images, text, video or audio), the way they see the world is growing more similar.

Their Platonic Representation Hypothesis suggested that, like Plato's ideal forms, these systems appeared to be discovering the same underlying structure of the world.

The idea quickly captured the imagination of the AI community and raised profound questions. If different AI systems independently arrive at the same internal view of reality, does this reveal something fundamental about intelligence itself?

Measuring distances between concepts