Part 3 of a 4-part series. QLoRA explained — quantize the frozen base to 4-bit, then LoRA on top. The BitsAndBytesConfig that matters, the memory-footprint moment, and why it's about fit, not speed.

Part 2 of a 4-part series. How LoRA works (the low-rank trick), a working PEFT config, and three real GPU walls I hit — the FP16 unscale error, an OOM, and a 2-GPU speed mystery.

Part 3 of a 4-part series. QLoRA explained — quantize the frozen base to 4-bit, then LoRA on top. The BitsAndBytesConfig that matters, the memory-footprint moment, and why it's…