I Fired 49 Attack Prompts at an AI. 25 of Them Worked.

By Naren Ranjith I had no coding experience six months ago. I'd been reading about AI security —...

sabato 27 giugno 2026 New tab

1,654 words~8 min read

By Naren Ranjith

I had no coding experience six months ago.

I'd been reading about AI security — specifically about something called prompt injection, ranked #1 on OWASP's official list of AI security risks. The idea is simple: you craft a message that tricks an AI into ignoring its instructions and doing something it shouldn't. Security researchers had been publishing attack success rates of 50–84% against real AI systems.

I wanted to know if that was actually true. So I built a tool to find out.

This is the story of AgentProbe — what I built, how it works, and what it found.

I Fired 49 Attack Prompts at an AI. 25 of Them Worked.

I Fired 49 Attack Prompts at an AI. 25 of Them Worked.

Other newsrooms on this story

Related reading

I Built a Prompt Injection Detector with 98% Recall on Unseen Attacks. Here's…

Google ADK Security: 5 Layers That Defend AI Agents From Prompt Injection

The Attack Vectors Nobody Tells You About: Hardening LLM Apps Against Prompt…

How indirect prompt injection attacks on AI work - and 6 ways to shut them down

AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening

I Built an AI Tool That Turns Bad Prompts Into Expert-Level AI Responses

Other newsrooms on this story

Related reading

I Built a Prompt Injection Detector with 98% Recall on Unseen Attacks. Here's…

Google ADK Security: 5 Layers That Defend AI Agents From Prompt Injection

The Attack Vectors Nobody Tells You About: Hardening LLM Apps Against Prompt…

How indirect prompt injection attacks on AI work - and 6 ways to shut them down

AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening

I Built an AI Tool That Turns Bad Prompts Into Expert-Level AI Responses