Repo: github.com/AmmarHassona/trainsafe

I was working on fine-tuning an open-source small language model (SLM) on Arabic using DPO. I had the data, the pipeline, and everything set up for training. I was fairly confident that this training run would improve the model and align it further to what I wanted. I started the training and let it run until it finished. When I came back to test the checkpoint, it was speaking Chinese.

Loss only tells you the model is learning something — not what it's actually learning. By the time training finished, I had wasted my time and my compute with nothing useful to show for it. If only there was something to tell me if training was actually going well before it was too late.

This is when I began looking at tools that could help me solve this issue. Nothing existed that did exactly what I needed, so I built it myself. I built trainsafe to plug into any HuggingFace or TRL training pipeline with two lines of code. It runs alongside your training and checks whether the model's outputs are still behaving correctly at every eval checkpoint — catching issues like language drift, output collapse, and repetition loops before the run finishes.

Getting Started