What Ollama actually is

Ollama is an open-source runtime for large language models that runs on your own computer — Mac, Windows, or Linux. Think of it as the “Docker for LLMs”: instead of wrestling with Python environments, model weights, and GPU drivers, you type one command and a model is running.

The pitch is simple: keep your data on your machine, pay nothing per token, and work offline. When you run ollama run gemma4, Ollama downloads the model, loads it into your GPU’s memory (or system RAM if you don’t have a GPU), and drops you into a chat prompt. That’s it.

Behind that simplicity, Ollama is doing a lot of work for you:

Model management — pulling, versioning, and storing models from its registry, the way a package manager handles software.