EXCLUSIVE ChatGPT can’t tell its own generated content from attacker-controlled Markdown pulled from external sources, according to a researcher who found the prompt injection technique and reported it to OpenAI. This means that if a user asks the chatbot to summarize a web page that contains hidden instructions, the page can become the payload.An attacker could abuse this blind trust to inject phishing URLs into ChatGPT responses, or even trick the model into showing fake security alerts written in ChatGPT's own style, Permiso threat hunter Andi Ahmeti told The Register. In a report shared with us ahead of publication, Ahmeti also demonstrated how criminals could exploit this trust issue to pivot their attack from a victim’s browser to their mobile device by displaying an inline QR code. The victim scans the QR code with their phone and is taken to content hosted in an attacker-controlled S3 bucket, and this allows the baddie to bypass every desktop URL defense, including blocklists and password-manager domain checks, Ahmeti warned.

“AI systems increasingly render untrusted content directly inside browsers, which expands risk significantly,” he told us. “The bigger issue is that AI products are starting to resemble browser or operating system environments, which creates a much larger security surface.”