Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

The safety controls that Meta and Google embed in their open-weight AI models can be dismantled in under 10 minutes using freely available tools. That’s not a theoretical risk. It’s the result of hands-on testing conducted by the Financial Times in partnership with AI safety group Alice, published on May 25.

The tests targeted Meta’s Llama 3.3 and Google’s Gemma 3, two of the most widely distributed open-weight models in circulation. After modification, both models produced outputs on topics their creators explicitly prohibit, including biological weapons and malware creation.

How the guardrails fell apart

When a company like Meta or Google releases an open-weight model, they’re publishing the model’s weights, essentially the learned parameters that define how the system behaves. Developers add safety layers on top of those weights during a process called post-training alignment. The tool used in the Financial Times testing is called Heretic, and it’s publicly available on GitHub. The tool strips away the post-training safety alignment, reverting the model to a state where it will respond to virtually any prompt without restriction.

Once the weights are out in the wild, modified versions proliferate quickly. Thousands of altered variants of popular open-weight models already circulate across developer platforms and forums, many of them stripped of the original safety controls their creators intended to be permanent.

How the guardrails fell apart

Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Other newsrooms on this story

Related reading

AI guardrails stripped from Meta and Google models in minutes

Software can strip AI safety measures from Meta, Google models in minutes

AI Guardrail Removals Expose Gaps in Open‑Source Regulation

Google limits Meta's use of its Gemini AI models: report

Google limits Meta’s use of its Gemini AI models

Google limits Meta’s use of its Gemini AI models: Report

Other newsrooms on this story

Related reading

AI guardrails stripped from Meta and Google models in minutes

Software can strip AI safety measures from Meta, Google models in minutes

AI Guardrail Removals Expose Gaps in Open‑Source Regulation

Google limits Meta's use of its Gemini AI models: report

Google limits Meta’s use of its Gemini AI models

Google limits Meta’s use of its Gemini AI models: Report