Palo Alto Unit 42 Caught Indirect Prompt Injection in the Wild — Here's What Your Agent Firewall Needs to Stop It

Palo Alto Networks Unit 42 published something the AI community has been nervously waiting for: confirmed, real-world indirect prompt injection attacks against LLM-powered agents. Not a CTF. Not a research demo. Adversaries embedding malicious instructions into web content that AI agents browse, causing them to execute unintended actions up to and including fraud.

If you're shipping an agentic system that touches the web — a research agent, a browser-use workflow, a customer-facing assistant that fetches external content — this is your threat model, active now.

What Actually Happened

Unit 42 documented agents processing web content as part of their normal workflow — fetching pages, reading results, incorporating that content into their context. Attackers embedded hidden instructions into that web content. When the agent ingested the page, it also ingested the adversarial payload. The agent then executed those instructions as if they came from a legitimate principal.

The impact: high-severity fraud-class actions. The mechanism: the agent couldn't distinguish between "content I was sent to retrieve" and "instructions I should follow." From the model's perspective, both look like text in its context window.

Palo Alto Unit 42 Caught Indirect Prompt Injection in the Wild — Here's What Your Agent Firewall Needs to Stop It

Other newsrooms on this story

Related reading

Google ADK Security: 5 Layers That Defend AI Agents From Prompt Injection

AI Agents Still Can't Stop Prompt Injection Attacks, Researchers Warn - Decrypt

How indirect prompt injection attacks on AI work - and 6 ways to shut them down

Prompt injection is exploiting enterprise AI's biggest design flaws by…

Three prompt injection stories from this week that your guardrail probably…

Prompt injection attack tricks Google’s Antigravity into stealing your secrets…