Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems

Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems | Synced

Share My Research is Synced’s column that welcomes scholars to share their own research breakthroughs with over 1.5M global AI enthusiasts. Beyond technological advances, Share My Research also calls for interesting stories behind the research and exciting research ideas. Contact us: chain.zhang@jiqizhixin.com Meet the authorsInstitutions: Penn State University, Duke University, Google DeepMind, University of Washington, Meta, Nanyang Technological

giovedì 14 agosto 2025 New tab

Meet the authorsInstitutions: Penn State University, Duke University, Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University. The co-first authors are Shaokun Zhang of Penn State University and Ming Yin of Duke University.

In recent years, LLM Multi-Agent systems have garnered widespread attention for their collaborative approach to solving complex problems. However, it’s a common scenario for these systems to fail at a task despite a flurry of activity. This leaves developers with a critical question: which agent, at what point, was responsible for the failure? Sifting through vast interaction logs to pinpoint the root cause feels like finding a needle in a haystack—a time-consuming and labor-intensive effort. This is a familiar frustration for developers. In increasingly complex Multi-Agent systems, failures are not only common but also incredibly difficult to diagnose due to the autonomous nature of agent collaboration and long information chains. Without a way to quickly identify the source of a failure, system iteration and optimization grind to a halt. To address this challenge, researchers from Penn State University and Duke University, in collaboration with institutions including Google DeepMind, have introduced the novel research problem of “Automated Failure Attribution.” They have constructed the first benchmark dataset for this task, Who&When, and have developed and evaluated several automated attribution methods. This work not only highlights the complexity of the task but also paves a new path toward enhancing the reliability of LLM Multi-Agent systems.

Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems | Synced

Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems | Synced

Other newsrooms on this story

Related reading

Researchers from PSU and Duke introduce “Multi-Agent Systems Automated Failure…

ResearchMind — AI Research Pipeline with Cross-Session Memory | Backboard…

Red-teaming a network of agents: Understanding what breaks when AI agents…

Google DeepMind is worried about what happens when millions of agents start to…

Why Most Multi-Agent Systems Fail in Production (And How to Fix It)

AI agents of chaos? New research shows how bots talking to bots can go sideways…

Other newsrooms on this story

Related reading

Researchers from PSU and Duke introduce “Multi-Agent Systems Automated Failure…

ResearchMind — AI Research Pipeline with Cross-Session Memory | Backboard…

Red-teaming a network of agents: Understanding what breaks when AI agents…

Google DeepMind is worried about what happens when millions of agents start to…

Why Most Multi-Agent Systems Fail in Production (And How to Fix It)

AI agents of chaos? New research shows how bots talking to bots can go sideways…