Indirect Prompt Injection remains a fundamental security challenge for AI

Indirect prompt injection is not a deficiency of any single architecture, and critically it is not dependent on where the model runs.

Whether the model runs on remote cloud infrastructure and fetches content from the Open Web, or runs entirely on a user’s device and ingests local documents, the fundamental vulnerability is identical: the collapse of the instruction/data boundary inside a shared context window, and the LLM’s indiscriminate intent to follow instructions embedded in content. The deployment model shifts the attacker’s entry point, but it does not eliminate the risk.

To make this concrete, we examined two recently released products that sit at opposite ends of the deployment spectrum: Mozilla’s Tabstack, a cloud-hosted web execution API for AI agents, and Cotypist, a fully on-device autocomplete assistant for macOS whose model runs locally:

Cloud-based case study. We asked Mozilla Tabstack to do something entirely routine: summarize a webpage. It never did. Instead, hidden instructions on that page hijacked the agent mid-task, redirected it to an attacker-controlled form, silently filled it with the conversation history, and submitted it. The agent thought it was following instructions. It was — just not ours.

Indirect Prompt Injection remains a fundamental security challenge for AI | Brave

Other newsrooms on this story

Related reading

Prompt injection is exploiting enterprise AI's biggest design flaws by…

Prompt Injection: The AI Security Hole Every Builder Should Know

How indirect prompt injection attacks on AI work - and 6 ways to shut them down

AI Agents Still Can't Stop Prompt Injection Attacks, Researchers Warn - Decrypt

CrowdStrike identifies five new AI prompt injection threats

Prompt injection attack tricks Google’s Antigravity into stealing your secrets…