A friend of mine runs security at a mid-sized fintech. About 400 engineers, Series D, the kind of place where the AI strategy memo got written in a weekend and shipped to prod the following Tuesday. She called me in March, somewhere between annoyed and panicked, because her board had asked a simple question: "What models are we running, and where did they come from?"
She had answers. They were wrong.
The official answer was "GPT-4 and Claude, through the standard gateway." The actual answer, after two weeks of digging, was seventeen models. A Llama 3.1 8B fine-tune hosted on a self-managed vLLM box that someone in the ML team spun up for a customer-support prototype, then forgot to tear down. Three Hugging Face embedding models pulled at container build time with no pinned hashes. A Whisper variant running on a developer's GPU workstation that was, somehow, reachable from the staging VPC. Two LoRA adapters fine-tuned on customer support tickets, sitting in an S3 bucket with a permissive policy. And a llama.cpp build serving a 7B model to an internal Slack bot that nobody could remember authorizing.
The part that got me, when she walked me through it, was that none of this was shadow IT in the old sense. Every one of these systems had a Jira ticket. Every one had been "approved." The inventory just didn't exist. There was no single place she could point at and say: this is what we run, this is where the weights came from, this is what data trained them, this is what they're allowed to talk to.












