Rather than assuming alignment always works, the company built a safety framework that plans for the worst case: AI agents that go off script.
Deepmind's AI Control Roadmap details how Google locks down its most advanced internal AI systems. The company thinks it could work as a blueprint for the rest of the industry, too.
The framework assumes that a highly capable AI agent might not share its operators' goals and plans accordingly. Deepmind compares it to a driving instructor with dual controls: The instructor trusts the student but keeps a hand near the wheel and a foot near the brakes. Same idea here. AI agents only get permissions based on verified behavior, and trust builds gradually through controlled access.
Deepmind models its AI agents as insider threats
Deepmind treats its internal AI agents like employees who already have office access but might work against the company's interests. The framework builds on the MITRE ATT&CK framework, a well-established cybersecurity tool that breaks potential attacks down into individual tactics and techniques. That lets Deepmind track risks systematically, spot suspicious behavior early, and test defenses in controlled exercises.







