Why output-stage PII masking is the wrong protective surface for data exfiltration in RAG

"The output filter runs after the LLM has already seen the confidential data. By then, three classes of leak can no longer be stopped. The right surface is retrieval. Walking through a real implementation."

TL;DR

Most RAG-with-RBAC stacks I see in production put the access-control gate at the output stage: an LLM-response post-filter that masks PII or redacts confidential strings. This is defense-in-depth, not the load-bearing layer. By the time the filter runs, the LLM has already received the confidential context, and three classes of leak — creative paraphrasing, inference, cross-turn persistence — can no longer be stopped by string-matching the output. The protective surface that actually carries the weight is retrieval-stage ABAC: documents and graph nodes the user can't read are never traversed, never make it into the prompt, never seen by the model. The output filter still belongs in the stack, but as the second-to-last line, not the first.

This post is a walk through why and how, with code references from a working implementation. It was prompted by a 6-turn LinkedIn DM exchange with Ali Afana (Provia founder, dev.to Featured) on injection-fixture schema design, where the framing crystallized.

TL;DR

Why output-stage PII masking is the wrong protective surface for data exfiltration in RAG

Other newsrooms on this story

Why output-stage PII masking is the wrong protective surface for data exfiltration in RAG

Other newsrooms on this story

Related reading

The Access Control Gap That Makes Most Enterprise RAG Systems Dangerous

Securing the Retrieval-Augmented Generation (RAG)

We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.

PII masking in Polars: MaskOps 2.0, and two metrics that lied to me

Presidio as an LLM Guardrail

What Enterprise RAG Is Ready For Today and What Production Deployment Actually…

Related reading

The Access Control Gap That Makes Most Enterprise RAG Systems Dangerous

Securing the Retrieval-Augmented Generation (RAG)

We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.

PII masking in Polars: MaskOps 2.0, and two metrics that lied to me

Presidio as an LLM Guardrail

What Enterprise RAG Is Ready For Today and What Production Deployment Actually…