AsgardBench: A benchmark for visually grounded interactive planning
AsgardBench evaluates whether embodied agents can revise their plans based on visual observations as tasks unfold. By focusing on perception-driven planning, it exposes key limitations and guides improvements in agent reliability.