New Anthropic research shows that undesirable LLM traits can be detected—and even prevented—by examining and manipulating the model’s inner workings.