The safety controls that Meta and Google embed in their open-weight AI models can be dismantled in under 10 minutes using freely available tools. That’s not a theoretical risk. It’s the result of hands-on testing conducted by the Financial Times in partnership with AI safety group Alice, published on May 25.
The tests targeted Meta’s Llama 3.3 and Google’s Gemma 3, two of the most widely distributed open-weight models in circulation. After modification, both models produced outputs on topics their creators explicitly prohibit, including biological weapons and malware creation.
How the guardrails fell apart
When a company like Meta or Google releases an open-weight model, they’re publishing the model’s weights, essentially the learned parameters that define how the system behaves. Developers add safety layers on top of those weights during a process called post-training alignment. The tool used in the Financial Times testing is called Heretic, and it’s publicly available on GitHub. The tool strips away the post-training safety alignment, reverting the model to a state where it will respond to virtually any prompt without restriction.
Once the weights are out in the wild, modified versions proliferate quickly. Thousands of altered variants of popular open-weight models already circulate across developer platforms and forums, many of them stripped of the original safety controls their creators intended to be permanent.









