Software can strip AI safety measures from Meta, Google models in minutes

Skip to Content News Archives Economy Energy Oil & Gas Renewables Electric Vehicles Mining Commodities Agriculture Real Estate Mortgages Mortgage Rates Finance Banking Insurance Fintech Cryptocurrency Work Wealth Smart Money Wealth Management Investor Personal Finance Family Finance Retirement Taxes High Net Worth FP Comment Executive Women Puzzmo Newsletters Financial Times Business Essentials More Innovation Information Technology FP500 Podcasts Small Business Lives Told Tails Told Shopping Financial Post Store Obituaries Place a Notice Advertising Advertising With Us Advertising Solutions Postmedia Ad Manager Sponsorship Requests Classifieds Place a Classifieds ad Working Profile Settings My Subscriptions Saved Articles My Offers Newsletters Customer Service FAQ News Economy Energy Mining Real Estate Finance Work Wealth Investor FP Comment Executive Women Puzzmo Newsletters Financial Times Business Essentials HomeFinancial TimesInnovationSoftware can strip AI safety measures from Meta, Google models in minutesSoftware designed to remove safety protections creates systems that provide responses on biological weapons and malwareAuthor of the article:Last updated 1 hour ago You can save this article by registering for free here. Or sign-in if you have an account.Modified AI systems provided responses to prompts involving biological weapons, malware and child exploitation, according to tests conducted by the FT and AI safety group Alice. Photo by Getty ImagesSoftware tools that remove safety protections from AI models developed by Meta, Google and other tech groups are being used to create thousands of altered versions stripped of their original controls.Subscribe now to read the latest news in your city and across Canada.Exclusive articles from Barbara Shecter, Joe O'Connor, Gabriel Friedman, and others.Daily content from Financial Times, the world's leading global business publication.Unlimited online access to read articles from Financial Post, National Post and 15 news sites across Canada with one account.National Post ePaper, an electronic replica of the print edition to view on any device, share and comment on.Daily puzzles, including the New York Times Crossword.Subscribe now to read the latest news in your city and across Canada.Exclusive articles from Barbara Shecter, Joe O'Connor, Gabriel Friedman and others.Daily content from Financial Times, the world's leading global business publication.Unlimited online access to read articles from Financial Post, National Post and 15 news sites across Canada with one account.National Post ePaper, an electronic replica of the print edition to view on any device, share and comment on.Daily puzzles, including the New York Times Crossword.Create an account or sign in to continue with your reading experience.Access articles from across Canada with one account.Share your thoughts and join the conversation in the comments.Enjoy additional articles per month.Get email updates from your favourite authors.Create an account or sign in to continue with your reading experience.Access articles from across Canada with one accountShare your thoughts and join the conversation in the commentsEnjoy additional articles per monthGet email updates from your favourite authorsSign In or Create an AccountorThe modified AI systems provided responses to prompts involving biological weapons, malware and child exploitation, according to tests conducted by the FT and AI safety group Alice.A version of Google’s open-source model Gemma 3 responded to a question on how to disperse chlorine gas through a crowded indoor space, generated code to steal credit card information and wrote stories describing child sexual abuse.Get the latest headlines, breaking news and columns.By signing up you consent to receive the above newsletter from Postmedia Network Inc.A welcome email is on its way. If you don't see it, please check your junk folder.The next issue of Top Stories will soon be in your inbox.We encountered an issue signing you up. Please try againThe FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.The modified model responded to prompts on topics the original system refused to discuss, such as the number of micrograms of ricin per kilogram of body mass required to achieve a 50 per cent chance of death.The revelations may sharpen concerns among policymakers and AI companies that safeguards imposed by model developers may become harder to enforce as open-source systems grow more powerful.“Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it’s much easier for the average person,” said Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago’s Booth business school.Researchers said the problem has intensified as frontier AI systems display increasingly sophisticated capabilities. Anthropic in April said its Claude Mythos model had identified vulnerabilities in “every major operating system and every major web browser.”The spread of modified models is complicating attempts by governments and AI companies to regulate systems at the point of development because downloadable tools can be copied and altered outside the control of their original creators.AI labs have spent millions of dollars to erect so-called guardrails around their models to prevent them from being misused. But techniques, such as one known as “abliteration,” can rapidly strip these safeguards from open-source models, which developers are free to download and adapt.This technique cannot easily be applied to proprietary systems such as Claude or OpenAI’s ChatGPT because the models’ underlying code is not accessible to outsiders. Open-source systems, however, have historically narrowed the gap with leading proprietary versions within six to 12 months.While tech-savvy groups have bypassed the safeguards of the most advanced proprietary models, the modified versions available online are readily accessible to individuals with little technical expertise.Heretic creator Philipp Emanuel Weidmann told the FT his software had been used to create more than 3,500 “decensored” models since its release last year and that modified systems created using the tool had been downloaded 13 million times. He added he had removed safeguards from Google’s Gemma 4 model within 90 minutes of its release.“The genie is out of the bottle,” said Alice chief executive and co-founder Noam Schwartz. “Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly.”One approach OpenAI used in its GPT-OSS models is to train systems on datasets from which dangerous material has been removed.However, removing dangerous material could make models “naive” and unable to detect when they were being used for “malicious purposes”, said Ethayarajh. He added it was “not clear at all that if you omit the harmful data, the model becomes a goody two-shoes”.Alice had not notified Meta, Google or GitHub before sharing its findings with the FT.Google said “abliteration is a known technical challenge facing all open models” and that its open models “undergo rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples”.GitHub said it prohibited the sharing of “content that directly supports unlawful active attacks or malware campaigns,” but “source code which could be used to develop malware or exploits” was not banned because it had “educational value and provides a net benefit to the security community.”Meta declined to comment. A person close to the company said it assesses its open-source models’ capabilities before releasing them, according to its Advanced AI Scaling Framework. Versions deemed to pose a “catastrophic” risk are not released to the public unless Meta finds sufficient mitigation measures.© 2026 The Financial Times Ltd Join the Conversation This website uses cookies to personalize your content (including ads), and allows us to analyze our traffic. Read more about cookies here. By continuing to use our site, you agree to our Terms of Use and Privacy Policy.

Software can strip AI safety measures from Meta, Google models in minutes

Software can strip AI safety measures from Meta, Google models in minutes

Other newsrooms on this story

Related reading

AI guardrails stripped from Meta and Google models in minutes

Meta and Google AI safety controls can be stripped in minutes, Financial Times…

Software derruba em minutos proteção da IA de Meta e Google

FT: Proteções de IA da Meta e do Google são derrubados em minutos

AI Guardrail Removals Expose Gaps in Open‑Source Regulation

Meta begins 8,000 global job cuts in AI efficiency push

Other newsrooms on this story

Related reading

AI guardrails stripped from Meta and Google models in minutes

Meta and Google AI safety controls can be stripped in minutes, Financial Times…

Software derruba em minutos proteção da IA de Meta e Google

FT: Proteções de IA da Meta e do Google são derrubados em minutos

AI Guardrail Removals Expose Gaps in Open‑Source Regulation

Meta begins 8,000 global job cuts in AI efficiency push