LoRA: Fine-Tune a Giant Model by Training 1% of It

Fine-tuning a large model used to mean one painful thing: update every weight in it, keep a full copy per task, and pay for the GPUs to do it. A 7-billion-parameter model has 7 billion knobs. The Adam optimizer keeps two extra numbers per knob, so you're suddenly holding roughly three times the model in memory. Then, for every new task you tune, you store another 13 GB checkpoint. It works, but it's absurdly out of proportion to how much the model actually needs to change to, say, adopt a new tone or learn your domain's vocabulary.

LoRA — Low-Rank Adaptation — is the trick that makes that whole problem go away. And it's built on one clean observation.

The update is low-rank

When you fine-tune, what you really produce is a difference: the new weights minus the old weights, call it ΔW. The insight behind LoRA is that this difference has low "intrinsic rank." The useful change lives in a tiny subspace, not the full d×d space of the matrix. A big matrix of numbers can often be reconstructed almost perfectly from the product of two skinny matrices.

So instead of learning ΔW directly, LoRA learns two small factors whose product is ΔW:

LoRA — Low-Rank Adaptation — is the trick that makes that whole problem go away. And it's built on one clean observation.

The update is low-rank

So instead of learning ΔW directly, LoRA learns two small factors whose product is ΔW:

LoRA: Fine-Tune a Giant Model by Training 1% of It

LoRA: Fine-Tune a Giant Model by Training 1% of It

Related reading

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Fine-tuning — Domain-Specializing Models with LoRA

If a 270M Model Already Worked, Why Did I Fine-Tune a 7B One?

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging…

Related reading

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Fine-tuning — Domain-Specializing Models with LoRA

If a 270M Model Already Worked, Why Did I Fine-Tune a 7B One?

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging…