Series — Fine-Tuning, Smallest to Largest (same task, three techniques, smallest model to largest):
Full Fine-Tuning (270M) ← you are here
LoRA (1.5B)
QLoRA (7B)
If the small model worked, why go bigger?
Part 1 of a 4-part series. Full fine-tuning a tiny Gemma 3 model for intent classification — the generative framing, the loss-masking trick, and why full FT is so learning-rate sensitive.
Author fine-tuned Gemma 3 (270M parameters) on Banking77 intent classification, achieving ~96% accuracy via full fine-tuning on a laptop. Full fine-tuning updates all weights (4× memory cost) but is fragile—learning rate 5e-5 stable, 2e-4 crashes—making it the expensive baseline to benchmark LoRA/QLoRA against.
Series — Fine-Tuning, Smallest to Largest (same task, three techniques, smallest model to largest):
Full Fine-Tuning (270M) ← you are here
LoRA (1.5B)
QLoRA (7B)
If the small model worked, why go bigger?

Part 4 (finale) of a 4-part series. Three model sizes tied on the same task — so when does bigger actually earn its keep? And the…

Part 2 of a 4-part series. How LoRA works (the low-rank trick), a working PEFT config, and three real GPU walls I hit — the FP16…

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Together AI expands Fine-Tuning Platform: train 100B+ models, extend context lengths, integrate with Hugging Face Hub, and access…

Part 3 of a 4-part series. QLoRA explained — quantize the frozen base to 4-bit, then LoRA on top. The BitsAndBytesConfig that…

Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x…