Free Models, Zero Compromise: Routing to Local and Free Tiers

Not every request needs a frontier model, and a surprising share of them can run for nothing at all. The problem is that "free" usually sounds like "worse," so teams pay for every request just to be safe. Routing is what removes that trade-off.

There are actually two separate pools of zero-cost inference, and they behave very differently. It's worth knowing both before you decide what to send where.

Pool one: local models

Run a model on your own hardware and the marginal cost of a request is zero. Manifest connects local servers the same way it connects anything else: Ollama, LM Studio, and llama.cpp, plus any other OpenAI-compatible server you point it at.

Three things make local special. It's free, in the sense that you pay for electricity, not per token. It's private, because the prompt never leaves your machine. And it has no rate limits, because you aren't sharing a quota with strangers. The catch is just as simple: you need the hardware, and a small local model is not Opus. Which is exactly why you don't send it the hard work.

There are actually two separate pools of zero-cost inference, and they behave very differently. It's worth knowing both before you decide what to send where.

Pool one: local models

Free Models, Zero Compromise: Routing to Local and Free Tiers

Other newsrooms on this story

Free Models, Zero Compromise: Routing to Local and Free Tiers

Other newsrooms on this story

Related reading

Choosing the Right Model-Routing Threshold for Frontier Models

Cutting our LLM bill ~80% with model routing: the actual cost math

Inference Routing Is Becoming an Infrastructure Placement Problem

Free students, paid teachers: how cheap LLMs learn from expensive ones

Frontier-Quality Coding at Cheap-Tier Cost: What We Built, and How We Measured…

Why One Model Is Never Enough: Routing Incident Analysis With cascadeflow

Related reading

Choosing the Right Model-Routing Threshold for Frontier Models

Cutting our LLM bill ~80% with model routing: the actual cost math

Inference Routing Is Becoming an Infrastructure Placement Problem

Free students, paid teachers: how cheap LLMs learn from expensive ones

Frontier-Quality Coding at Cheap-Tier Cost: What We Built, and How We Measured…

Why One Model Is Never Enough: Routing Incident Analysis With cascadeflow