ByLance Eliot,
Contributor.
In today’s column, I examine a newly published approach to interpreting what is occurring inside generative AI and large language models (LLMs).
The approach was developed by Anthropic, famed makers of Claude. They have coined the new method as NLA (natural language autoencoders). This approach is one of many that are being explored by AI researchers and AI practitioners worldwide. The hope is to find a suitable means to explain how the numbers and numeric calculations internal to an LLM are capable of representing human concepts and human logic.
One of the biggest unknowns about modern-era AI is how they turn numbers into something exhibiting human-like intellectual tendencies. If you ask an LLM to explain itself, many people assume that they are getting an apt rendition of what the AI is computationally undertaking. Instead, often, they are getting a charade, a made-up explanation that might have little or nothing to do with the actual internal machinations. This is known in the AI community as the AI interpretability problem.













