Anthropic rewrites Claude’s guiding principles—and reckons with the possibility of AI consciousness

Anthropic is overhauling a foundational document that shapes how its popular Claude AI model behaves. The AI lab is moving away from training the model to follow a simple list of principles—such as choosing the response that is least racist or sexist—to instead teach the AI why it should act in certain ways.

“We believe that in order to be good actors in the world, AI models like Claude need to understand why we want them to behave in certain ways rather than just specifying what we want them to do,” a spokesperson for Anthropic said in a statement. “If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize and apply broad principles rather than mechanically follow specific rules.”

The company published the new “constitution”—a detailed document written for Claude that explains what the AI is, how it should behave, and the values it should embody—for Claude on Wednesday. The document is central to Anthropic’s “Constitutional AI” training method, where the AI uses these principles to critique and revise its own responses during training, rather than relying solely on human feedback to determine the right course of action.

Anthropic’s previous constitution, published in 2023, was a list of principles drawn from sources like the U.N. Declaration of Human Rights and Apple’s terms of service.

Anthropic rewrites Claude’s guiding principles—and reckons with the possibility of AI consciousness | Fortune

Other newsrooms on this story

Related reading

‘I think you’re testing me’: Anthropic’s newest Claude model knows when it’s…

Anthropic takes on OpenAI and Google with new Claude AI features designed for…

Inside Anthropic Claude is boosting developer productivity—but raising fears…

Anthropic just made every Claude user a no-code app developer

The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

Anthropic aggiorna le regole del suo chatbot, nuovi limiti a temi sensibili -…