Part 4 (finale) of a 4-part series. Three model sizes tied on the same task — so when does bigger actually earn its keep? And the bug no model size could fix.

Part 1 of a 4-part series. Full fine-tuning a tiny Gemma 3 model for intent classification — the generative framing, the loss-masking trick, and why full FT is so learning-rate…

Part 4 (finale) of a 4-part series. Three model sizes tied on the same task — so when does bigger actually earn its keep? And the bug no model size could fix.