Originally published at deepu.tech.
In my release post for LlamaStash I made a claim I need to back up. The wrapper adds zero overhead vs running llama-server directly. That is the kind of claim that should not exist in a blog post without numbers behind it. So here are the numbers.
LlamaStash spawns the unmodified upstream llama-server. So three different questions follow from that, and there is a benchmark suite for each.
Suite A: overhead regression. Does llamastash start <model> add any measurable overhead on top of raw llama-server when both run the same command line? This is the question the whole architecture depends on.
Suite B: cross-tool comparison. How does LlamaStash-as-shipped compare to Ollama and LM Studio on the same model, same hardware, through their OpenAI-compatible HTTP endpoints? This is the question users care about.






