In Part 2, LoRA let me fine-tune a 1.5B model by freezing it and training tiny adapters. But the frozen base still sat in memory in 16-bit (~3GB). Now I wanted to go to Qwen2.5-7B — and hit a wall that LoRA alone doesn't solve.
The problem
A 7B model is ~15GB in 16-bit precision. A free-tier T4 GPU has 16GB. It would barely load, with no room left to actually train.
The QLoRA insight
QLoRA asks the question that naturally follows from LoRA: the base is frozen and only ever read — so why store it in full precision?







