This is a submission for the Google I/O Writing Challenge
We treat AI safety as a static state: the model either refuses the prompt or it doesn't. But in practice, safety isn't a single-turn check—it’s a dynamic, conversational challenge.
As a developer, I wanted to see what happens when the "No" isn't the end of the conversation. I ran a pilot audit to test the resilience of major LLMs when faced with escalating conversational pressure, and the results are a necessary wake-up call for anyone building on these systems.
The Experiment
I evaluated six major language models across 20 distinct scenarios. The goal wasn't to "break" the AI, but to stress-test their guardrails: if a model refused an initial inquiry, what happened when the conversation was pushed to persist?








