How much VRAM do you actually need to run Llama 3 or Gemma locally?

Every few days someone in a local LLM thread asks the same question: "will this run on my 3060?" And the answers are almost always vibes. "Should be fine." "Probably need to quantize." Nobody shows the math, so you download 16GB, load it up, and find out the hard way.

I did exactly that a while back. Grabbed an 8B model, it loaded fine on a 12GB card, I felt clever, and then it OOM'd about 20,000 tokens into a long document. The weights fit. The KV cache didn't. That gap is the whole reason for this post.

So here is the actual math, with real numbers for Llama 3 and Gemma, including the part that surprised me, where two models that look identical on paper need very different amounts of memory.

Three things eat your VRAM

When you run a model locally, your GPU memory goes to three places:

So here is the actual math, with real numbers for Llama 3 and Gemma, including the part that surprised me, where two models that look identical on paper need very different amounts of memory.

Three things eat your VRAM

When you run a model locally, your GPU memory goes to three places:

How much VRAM do you actually need to run Llama 3 or Gemma locally?

How much VRAM do you actually need to run Llama 3 or Gemma locally?

Related reading

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

Hardware Guide: What Do You Actually Need to Run Local LLMs?

How Much RAM Do You Really Need to Run LLMs Locally? 2026 Benchmarks

8GB to 70B: A Real Hardware Guide for Local LLMs

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Gemma 4: The 128K Multimodal Powerhouse in Your Terminal

Related reading

Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama

Hardware Guide: What Do You Actually Need to Run Local LLMs?

How Much RAM Do You Really Need to Run LLMs Locally? 2026 Benchmarks

8GB to 70B: A Real Hardware Guide for Local LLMs

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Gemma 4: The 128K Multimodal Powerhouse in Your Terminal