In new research, AI models show a troubling tendency to agree with whatever the user says.
Once in a while, LLMs turn evil—and no one quite knows why.