Serving several chat LLMs from one 16 GB RTX 5080 by reusing an existing router + ~150 lines of shell — with controlled benchmarks: where memory split doesn't matter, where prefix caching doubles throughput, and where 'smart' routing made things slower.