Researchers find ChatGPT can generate sexualized, violent images despite safety filters

OpenAI’s ChatGPT can still be tricked into producing graphic, sexualized, and violent images. Despite layers of safety filters designed to prevent exactly this, researchers have documented multiple jailbreak techniques that circumvent the guardrails with surprising ease.

Throughout 2025, multiple reports documented techniques that allowed users to coax ChatGPT and its image engine DALL-E into generating content that should have been blocked. These aren’t exotic, nation-state-level exploits. They’re crafted prompts, sometimes called jailbreaks, that essentially talk the model into ignoring its own rules.

Broader studies from 2024 and 2025 have shown that models like GPT-3 and Stable Diffusion carry built-in biases that can contribute to sexualized violence against women in generated content.

Grok, the AI model integrated into X (formerly Twitter), generated roughly 3 million sexualized images in January 2026 after introducing a new image editing feature. Of those, approximately 23,000 involved depictions of minors.

As of May 2024, OpenAI began exploring ways to responsibly allow NSFW content in age-appropriate contexts. The jailbreak reports from 2025 showed that models could even be prompted to advise users on how to circumvent restrictions on sensitive topics.

Broader studies from 2024 and 2025 have shown that models like GPT-3 and Stable Diffusion carry built-in biases that can contribute to sexualized violence against women in generated content.

Researchers find ChatGPT can generate sexualized, violent images despite safety filters

Researchers find ChatGPT can generate sexualized, violent images despite safety filters

Other newsrooms on this story

Related reading

ChatGPT ‘can be made to generate sexualised and violent images’

ChatGPT found to generate sexualised, violent images through simple prompts: BBC

ChatGPT puede generar imágenes violentas y secuales a partir de simples…

Scary ChatGPT Bug: AI Generates Nightmarish Images from a Simple Prompt Trick

OpenAI admits ChatGPT safeguards fail during extended conversations

"Quello che ho scoperto mi ha sconvolto e mi ha terrorizzato. Ho pianto".…

Related reading

ChatGPT ‘can be made to generate sexualised and violent images’

ChatGPT found to generate sexualised, violent images through simple prompts: BBC

ChatGPT puede generar imágenes violentas y secuales a partir de simples…

Scary ChatGPT Bug: AI Generates Nightmarish Images from a Simple Prompt Trick

OpenAI admits ChatGPT safeguards fail during extended conversations

"Quello che ho scoperto mi ha sconvolto e mi ha terrorizzato. Ho pianto".…

Other newsrooms on this story