This continues the research on why relevance alone is insufficient for agent memory safety.
Article A showed that the governance-adjusted scoring formula is a diagnostic, not an improvement. The held-out packet falsified the stronger version of the claim: relevance-only BM25 beat the full scorer on that packet. The failure pointed at missing or wrong governs, shallow action-type inference, and the need for write-time checks.
This article is about a different failure. Not a ranking-improvement claim. Not another weight-tuning story. The question here is simpler and worse:
What happens when retrieval works exactly as intended — and that makes things more dangerous?
The Scenario That Started This






