A research team at Google co-led by Michael Brenner, Catalyst Professor of Applied Mathematics and Physics at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Google research scientist, has produced a new artificial intelligence system that can automatically write scientific software programs that surpass the performance of human-written programs. Published in Nature, the system is called Empirical Research Assistance (ERA), and the project was co-led by Brenner and Shibl Mourad from Google DeepMind. Harvard Ph.D. students Qian-Ze Zhu, Ryan Krueger, and Sarah Martinson contributed as Google student researchers while working in Brenner’s group. The research was done in Brenner's capacity as a Catalyst Professor, a position established by the University to enhance relationships between academia and the private sector by supporting senior faculty in research roles at external companies. Across modern science, customized software is constantly used to test specific hypotheses or interpret complex data. The authors refer to this type of computer program as “empirical software” – a program whose sole purpose is to maximize how well it does on a scientific task, like making weather predictions or forecasting hospitalizations during a disease outbreak. Any problem that can be expressed as a numerical value – its “score” — is called a scorable task. Empirical software for solving such scorable tasks underpins major advances across many fields, including three recent chemistry Nobel prizes. But the specialized, custom-built software to tackle these experiments is labor-intensive, requiring a human to test and sharpen code many times over. The new ERA system removes this bottleneck by essentially automating the full cycle of scientific software design and refinement – a process that can normally take months or even years by human experts. The system combined the Google Gemini large language model with a search strategy to explore and refine thousands of pieces of code – far faster and with greater breadth than a human could. Starting with a baseline piece of code aimed at a specific problem, the new AI system proposes modifications by adding new components or switching out algorithms, toward the goal of improving a predefined quality score – for example, how accurately can this model predict the spread of a disease, based on past hospitalization numbers? How well does this model predict the shape of proteins based on these amino acid sequences? The system uses a method called tree search — also used in game-playing systems like AlphaGo — to decide which promising ideas to pursue and which to discard in order to get better at the task of predicting hospitalization numbers, predicting protein shapes, etc.