Quick story.

I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a couple of local model servers (Ollama mostly, vLLM when I'm playing around).

Every "why is this model OOM-ing" turned into the same five minutes of archaeology:

nvidia-smi → pick a PID

ps -o cgroup -p → find the container ID