"The output filter runs after the LLM has already seen the confidential data. By then, three classes of leak can no longer be stopped. The right surface is retrieval. Walking through a real implementation."

TL;DR

Most RAG-with-RBAC stacks I see in production put the access-control gate at the output stage: an LLM-response post-filter that masks PII or redacts confidential strings. This is defense-in-depth, not the load-bearing layer. By the time the filter runs, the LLM has already received the confidential context, and three classes of leak — creative paraphrasing, inference, cross-turn persistence — can no longer be stopped by string-matching the output. The protective surface that actually carries the weight is retrieval-stage ABAC: documents and graph nodes the user can't read are never traversed, never make it into the prompt, never seen by the model. The output filter still belongs in the stack, but as the second-to-last line, not the first.

This post is a walk through why and how, with code references from a working implementation. It was prompted by a 6-turn LinkedIn DM exchange with Ali Afana (Provia founder, dev.to Featured) on injection-fixture schema design, where the framing crystallized.