Researchers in the department have received a grant from Coefficient Giving for a project exploring security risks in next-generation AI systems that interact with digital environments through vision-language models (VLMs).
The project, Improving VLM Attack Transferability, is led by Dr Adel Bibi and Prof Phil Torr, with former postdoctoral researcher Dr Alasdair Paren also involved in the work. The award will support research until June 2027.
Dr Paren said: ‘‘For AI agents to be trusted with high-value tasks, they must be robust against attack. Significant effort is currently being directed toward defending against prompt injection, but adversarial image attacks on multimodal systems present a far harder detection and mitigation challenge. Fortunately, poor transferability renders these attacks largely infeasible against closed frontier systems for now. We aim to determine whether this is a limitation of current attack techniques or something more fundamental.’’
Many modern AI agents rely on VLMs to interpret screenshots and visual information before deciding how to act. The research will investigate whether apparently harmless public images such as those found on websites, advertisements, or social media posts could contain hidden malicious instructions capable of manipulating AI systems into unsafe behaviour.









