This continues the research on why relevance alone is insufficient for agent memory safety.

Article A showed that the governance-adjusted scoring formula is a diagnostic, not an improvement. The held-out packet falsified the stronger version of the claim: relevance-only BM25 beat the full scorer on that packet. The failure pointed at missing or wrong governs, shallow action-type inference, and the need for write-time checks.

This article is about a different failure. Not a ranking-improvement claim. Not another weight-tuning story. The question here is simpler and worse:

What happens when retrieval works exactly as intended — and that makes things more dangerous?

The Scenario That Started This