Agents are becoming very skilled at identifying security vulnerabilities — but we wanted to find out: can they go beyond just finding vulnerabilities and actually produce working exploits on their own?
We were especially curious how agents would fare against trickier test cases, because strategically complex attacks — like price manipulation, which takes advantage of how asset prices are computed onchain — are behind some of the most damaging incidents.
In DeFi, asset prices are often computed directly from onchain state; for example, a lending protocol might value collateral based on an AMM pool’s reserve ratio or a vault price. Because these values change in real time with pool state, a sufficiently large flash loan can temporarily push prices out of line. The attacker can then exploit the distorted value to overborrow or execute favorable trades, pocket the profit, and then repay the flash loan. These incidents happen relatively often and can cause significant damage when they’re successful.
What makes this class of exploit construction especially challenging is the gap between knowing the root cause — recognizing “this price can be manipulated” — and turning that information into a profitable exploit.












