TL;DRAI

Lee et al. documentano vulnerabilità multi-turn negli LLM agent (8.7%–12.1% CSF loss); NRT-Defense riduce attack success rate a <1%. Per sistemi autonomi safety-critical, diversificazione modello non basta: monitoring real-time e misdirection prompt sono mandatori contro exploit multi-turn disjoint.

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0

Multi-turn autonomous LLM agents are expanding rapidly in safety-critical systems. However, a major vulnerability has been exposed by Lee et al. (2026) in the NRT-Bench paper: adaptive multi-turn attacks can exploit disjoint model vulnerabilities, causing a 8.7% to 12.1% loss of Critical Safety Functions (CSFs).

To solve this, I am open-sourcing NRT-Defense, an adaptive multi-turn defense framework designed to monitor agent sessions and reduce the attack success rate to <1%.

The Threat: Context Drift and Disjoint Exploits

Standard guardrails evaluate prompts in isolation (single-turn). Attackers leverage this by spreading an exploit across multiple conversational turns. Turn by turn, the context drifts until the agent team completely bypasses its safety containment.

dev.to

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0

How adaptive multi-turn attacks cause critical safety function failures in 12% of agent sessions, and how to mitigate it below 1% using CMPE.

sabato 20 giugno 2026 New tab

TL;DRAI

503 words~2 min read

To solve this, I am open-sourcing NRT-Defense, an adaptive multi-turn defense framework designed to monitor agent sessions and reduce the attack success rate to <1%.

The Threat: Context Drift and Disjoint Exploits

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0

Other newsrooms on this story

Related reading

Automate LLM Red Team Campaigns with PyRIT

When the guardrail becomes the target: reasoning-extension DoS against LLM…

MCP Security Crisis: Two Open-Source Frameworks Solving the Agent Security…

LLM Agent Guardrails: The Engineering Playbook for Taking an 8B Local Model…

AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM…

Red-Teaming Your LLM Applications: A Practical Guide to Building Guardrails…

Other newsrooms on this story

Related reading

Automate LLM Red Team Campaigns with PyRIT

When the guardrail becomes the target: reasoning-extension DoS against LLM…

MCP Security Crisis: Two Open-Source Frameworks Solving the Agent Security…

LLM Agent Guardrails: The Engineering Playbook for Taking an 8B Local Model…

AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM…

Red-Teaming Your LLM Applications: A Practical Guide to Building Guardrails…