Ditching the cloud for local AI — how I use two mini PCs to process millions of tokens a day and save money on costly API fees

(Image credit: Framework)

For heavy AI users, the economics of the current boom are starting to bite. Over the past year, major labs have nudged prices upward while tightening the screws on usage — whether through stricter rate limits, smaller context windows on lower tiers, or the gradual reshuffling of features behind more expensive plans. Even where per-token costs have fallen in headline terms, the reality for users is more complicated: higher volumes, more complex workflows, and new tooling expectations mean monthly bills are creeping up, not down.At the same time, open-weight models have improved rapidly, consumer hardware has become more capable, and tools like LM Studio, Ollama, and llama.cpp have made local deployment far more accessible than it was even a year ago. The result is a renaissance in running models on your own machines.

Chris Stokel-Walker is a Tom's Hardware contributor who focuses on the tech sector and its impact on our daily lives— online and offline. He is the author of How AI Ate the World, published in 2024, as well as TikTok Boom, YouTubers, and The History of the Internet in Byte-Sized Chunks.

(Image credit: Framework)

Ditching the cloud for local AI — how I use two mini PCs to process millions of tokens a day and save money on costly API fees

Ditching the cloud for local AI — how I use two mini PCs to process millions of tokens a day and save money on costly API fees

Other newsrooms on this story

Related reading

Startups and tech giants wage AI price war as inference costs spiral out of…

Running AI Locally: Skip the API Bills and Build Faster

AI is getting expensive, but relief is on the way - just not for you

Inference Archives

The AI economy could crash on mounting chip costs — and those token costs won't…

Why I Replaced Most of My AI Subscriptions With a Mac Mini Running Local LLMs

Other newsrooms on this story

Related reading

Startups and tech giants wage AI price war as inference costs spiral out of…

Running AI Locally: Skip the API Bills and Build Faster

AI is getting expensive, but relief is on the way - just not for you

Inference Archives

The AI economy could crash on mounting chip costs — and those token costs won't…

Why I Replaced Most of My AI Subscriptions With a Mac Mini Running Local LLMs