Every vector database has the same problem: embeddings leak meaning.

If someone gets access to your vector store — breach, insider, subpoena — they don’t need to read your documents. They just cluster the embeddings. Five minutes later they know: these 500 vectors are medical records, these 200 are legal cases, these 100 are salary data.

I wanted to know: can you destroy that structure while keeping search working?

The experiment

I took 626,906 real passages from Microsoft’s MSMARCO dataset. I encoded them with a standard sentence transformer. Then I tried to make the embeddings unreadable without killing retrieval quality.