I've been leaning on AI inside my editor for a while now, and Cursor is the tool that finally made it stick. It sits right in the IDE, understands my files, genuinely good at the boring stuf, refactors...
But the more I leaned on it, the more one number kept nagging at me: tokens. Every prompt, every file I dragged in, every "explain this" : all of it burns through cloud usage, and on a busy day that adds up fast. The hard, occasional problems were worth it. The endless little ones weren't, and those were most of my day.
So the real question wasn't "is the cloud good enough". It was: why am I paying cloud tokens for work a local model could handle for free? I wanted the Cursor experience for the everyday grind without metering every keystroke against a usage limit. So I wired Cursor up to Ollama and ran Qwen 2.5 Coder 14B on my own server.
The privacy angle came along for the ride and turned out to be a genuine bonus : private repos, client code, and internal logic now stay on my own box. Saving tokens is what got me to actually do this, everything else was upside.
The thing that makes this possible is that Ollama speaks the OpenAI API : /v1/models, /v1/chat/completions, all of it. So anything expecting an OpenAI-style endpoint can be pointed at a local model instead. Cursor included.











