What Happened This Week

Week 1 established the baseline. This week is where the actual engineering begins.

Before any fine-tuning can happen, the training data has to be in the exact format the model expects. That sounds simple. It is not. This week involved loading a 112K-row medical dataset, discovering it was the wrong dataset for the goal, switching to a different dataset, building a cleaning pipeline, and formatting everything into the Llama 3.2 chat template. Every step had a decision worth explaining.

The Wrong Dataset

The initial plan was to use lavita/medical-qa-datasets with the medical_meadow_medqa subset. Loading it and inspecting the samples revealed a problem I initially ignored.