Every number below was measured on a single RTX 5080 (16 GB) and is reproducible
from the repo. Each result states the exact config it was measured under; I don't
compare numbers across configs, and I flag anything we did **not* cleanly measure.
TL;DR
You can serve several small chat LLMs from one 16 GB RTX 5080, behind a single






