OpenAI’s ChatGPT can still be tricked into producing graphic, sexualized, and violent images. Despite layers of safety filters designed to prevent exactly this, researchers have documented multiple jailbreak techniques that circumvent the guardrails with surprising ease.

Throughout 2025, multiple reports documented techniques that allowed users to coax ChatGPT and its image engine DALL-E into generating content that should have been blocked. These aren’t exotic, nation-state-level exploits. They’re crafted prompts, sometimes called jailbreaks, that essentially talk the model into ignoring its own rules.

Broader studies from 2024 and 2025 have shown that models like GPT-3 and Stable Diffusion carry built-in biases that can contribute to sexualized violence against women in generated content.

Grok, the AI model integrated into X (formerly Twitter), generated roughly 3 million sexualized images in January 2026 after introducing a new image editing feature. Of those, approximately 23,000 involved depictions of minors.

As of May 2024, OpenAI began exploring ways to responsibly allow NSFW content in age-appropriate contexts. The jailbreak reports from 2025 showed that models could even be prompted to advise users on how to circumvent restrictions on sensitive topics.