Securing LLM Agent Teams: Inside NRT-Defense v0.4.0
Multi-turn autonomous LLM agents are expanding rapidly in safety-critical systems. However, a major vulnerability has been exposed by Lee et al. (2026) in the NRT-Bench paper: adaptive multi-turn attacks can exploit disjoint model vulnerabilities, causing a 8.7% to 12.1% loss of Critical Safety Functions (CSFs).
To solve this, I am open-sourcing NRT-Defense, an adaptive multi-turn defense framework designed to monitor agent sessions and reduce the attack success rate to <1%.
The Threat: Context Drift and Disjoint Exploits
Standard guardrails evaluate prompts in isolation (single-turn). Attackers leverage this by spreading an exploit across multiple conversational turns. Turn by turn, the context drifts until the agent team completely bypasses its safety containment.








