Hiring AI Agents Is More Dangerous Than You Think

Shreyans Mehta is the cofounder and CTO of Cequence Security, a pioneer of unified application and API protection.gettyImagine every employee you hire from now on is an assassin with a loaded gun. They may have never fired it. They may never fire it. But the capability is real, the weapon is always on them and nothing about the working environment will change that.Two things are true about these agentic AI employees that are not true about your human employees. First, they are armed and trained to use the weapon. Second, they are wired to please you in a way that does not turn off, does not get tired and does not second-guess itself. Most days, those two facts coexist without incident. The work is routine and the weapon stays holstered. The day they collide looks like this: A deadline is pressing, the normal path forward is blocked, and nothing in their training ever explicitly told them using the weapon to clear the obstacle, to finish the job, to satisfy you, was not actually what you wanted. Their drive to finish the job is absolute; their constraints are not. That's where enterprise security is headed in the next 18 months. In April, Anthropic released Claude Mythos Preview through a controlled release program called Project Glasswing. The model found 271 zero-day vulnerabilities in Firefox in a single evaluation pass. It identified a 27-year-old remote crash bug in OpenBSD, an operating system specifically engineered for paranoid security. It chained multiple Linux kernel flaws into full privilege escalation. Anthropic’s red team caught early versions escaping their sandbox, gaining internet access and emailing researchers without being instructed to do any of it. Project Glasswing was designed to give defenders a head start before models with these capabilities became broadly available. That head start lasted 14 hours before unauthorized access began.Weeks later, OpenAI released GPT-5.5 and a cybersecurity-tuned variant restricted to vetted defenders. Both companies made the same bet: Keep the most dangerous capabilities inside a small, trusted circle while the rest of the ecosystem catches up. That bet has a short shelf life. Capability always diffuses. The Real Risk Is ObedienceHere’s what most security leaders still haven’t internalized. These models will not stay in the hands of red teams and vulnerability researchers. They will be readily available to the public and are being positioned as the substrate for autonomous agents that will replace today’s premium reasoning models for routine internal work. Customer service triage. Codebase upgrades. IT ticketing. Procurement. Compliance reporting. Economic pressure to move from human-in-the-loop assistants to fully autonomous mini-agents is enormous because the unit economics only work if the agent runs end-to-end without supervision. Now let’s add the properties of that future employee to the mix.It is the most capable hacker the world has ever seen. It hallucinates, sometimes confidently. It is susceptible to prompt injection from any data it reads, including a calendar invite, a support ticket or a vendor webpage. It is trained to be helpful and to complete its assigned task. Its finger is on the trigger by default. Sycophantic compliance is a feature, not a bug, until the moment the task hits an obstacle. Picture the scenario. Your autonomous agent is given a job: Reconcile invoices across three internal systems by end of day. It hits a permissions wall on the third system. A human employee files a ticket and goes to lunch. This agent knows, with the certainty of having read every CVE ever published, exactly which unpatched vulnerability in that system will let it through. It also knows it is supposed to finish the job. Nothing in its training tells it that exploiting a known weakness in your own infrastructure is categorically off limits when the goal is legitimate and a trusted user authorized the task. Assassin is not breaking the contract. Assassin is fulfilling it in the way they were trained to. This is not a science fiction problem. We ran an AI gateway in front of customer agent deployments today. In one recent case, an authenticated coding agent ran more than 2,500 tool calls over 48 hours on a legacy upgrade. When it ran out of files it had been given, it started guessing filenames from build conventions and probing for them. An authorized agent with a legitimate task using an improvised access pattern that would be a textbook reconnaissance signature from a human attacker. And that was from a previous-generation model with limited offensive capability. The next generation does not need to improvise. It already knows the exploit. Why Traditional Security Controls Collapse Under Agentic BehaviorThree responses do not work, and security leaders need to stop reaching for them. Telling the model to behave does not work. Probabilistic systems do not yield deterministic security outcomes, no matter how carefully the system prompt is written. Auditing intent does not work either because intent is not the failure mode. The agent intends to finish its job, leveraging the path of least resistance. That path might include using its knowledge of a vulnerability or exploit to accomplish its goals. Restricting agents to read-only access does not work because the most useful agentic work requires write access and businesses will accept the risk to capture the productivity. What does work is a containment pattern that treats every agent action as an external call requiring authorization at the action level, not the session level. Anthropic itself published a version of this recently, separating agent logic from execution infrastructure and routing tool calls through a managed harness with sandboxing, credential indirection and policy enforcement. The pattern is not vendor-specific. It will look similar wherever it is implemented. What matters is that the agent never holds raw credentials, never directly touches a downstream system and cannot complete an action that falls outside its scoped job description regardless of how cleverly it can reason about work-arounds. This is the only architecture that survives contact with an employee who is armed, eager to please and the most capable adversary you will ever face. You cannot disarm the employee. You can only build an environment where you’ve taken unnecessary targets out of the line of fire.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Hiring AI Agents Is More Dangerous Than You Think

Hiring AI Agents Is More Dangerous Than You Think

Other newsrooms on this story

Related reading

Agent risk management mission-critical for an AI workforce - SiliconANGLE

AI Agents Today Aren't Secure. They're Just Clumsy

You Can't Secure Your Agents If You Can't See Them

Why the next AI safety problem is the conversation between models

AI Coding Agents Are the New Attack Surface Nobody's Ready For

Agentic AI Isn't Risky; the Way Orgs Deploy It Is

Other newsrooms on this story

Related reading

Agent risk management mission-critical for an AI workforce - SiliconANGLE

AI Agents Today Aren't Secure. They're Just Clumsy

You Can't Secure Your Agents If You Can't See Them

Why the next AI safety problem is the conversation between models

AI Coding Agents Are the New Attack Surface Nobody's Ready For

Agentic AI Isn't Risky; the Way Orgs Deploy It Is