I got tired of guessing which model holds my VRAM, so I built a tiny dashboard

Quick story. I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a...

martedì 26 maggio 2026 New tab

234 words~1 min read

Quick story.

I run a small homelab — one box, an NVIDIA card, around ten Docker containers, and a couple of local model servers (Ollama mostly, vLLM when I'm playing around).

Every "why is this model OOM-ing" turned into the same five minutes of archaeology:

nvidia-smi → pick a PID

ps -o cgroup -p → find the container ID

I got tired of guessing which model holds my VRAM, so I built a tiny dashboard

I got tired of guessing which model holds my VRAM, so I built a tiny dashboard

Other newsrooms on this story

Related reading

Tesla P40 in a Homelab: 24GB of Inference on a Budget

Docker, Node, and Electron Walked Into My Terminal. So I Built a 3.5MB App to…

I built a Rust inference engine that streams MoE expert weights from NVMe SSDs,…

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent…

How I rescued a RAG assistant from memory leaks and got it running on a 512MB…

Stop Guessing: Real p99 Latency Data Comparing DeepSeek, Qwen, Kimi, and GLM

Other newsrooms on this story

Related reading

Tesla P40 in a Homelab: 24GB of Inference on a Budget

Docker, Node, and Electron Walked Into My Terminal. So I Built a 3.5MB App to…

I built a Rust inference engine that streams MoE expert weights from NVMe SSDs,…

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent…

How I rescued a RAG assistant from memory leaks and got it running on a 512MB…

Stop Guessing: Real p99 Latency Data Comparing DeepSeek, Qwen, Kimi, and GLM