Every number below was measured on a single RTX 5080 (16 GB) and is reproducible

from the repo. Each result states the exact config it was measured under; I don't

compare numbers across configs, and I flag anything we did **not* cleanly measure.

TL;DR

You can serve several small chat LLMs from one 16 GB RTX 5080, behind a single