Every enterprise AI conversation right now starts in the same place: "connect the model to our data." Then it stalls in the same place: which data, copied where, governed by whom.

I build retrieval for a living (I wrote the original open-source SWIRL), so let me make an argument that runs against the current default - and then show the architecture it implies.

The default is a second copy of your data

The standard RAG recipe is: crawl your sources, chunk them, embed them, and load the vectors into a database. Now your model can retrieve. It also means you have a second copy of your content living in an index you have to secure, keep in sync, and explain to whoever owns compliance. You've recreated every permission boundary by hand, and you'll eventually get one wrong.

For a lot of teams that copy is simply not allowed. Regulated content, client-confidential material, anything privileged - copying it into a vendor store is exposure you don't get paid to take on.