Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

Training modern Large Language Models is expensive.

When a single training run can consume millions of GPU hours, even small optimization decisions become important. Most developers focus on model architecture, dataset quality, and scaling laws. Yet one of the most influential knobs in training is surprisingly simple:

How should the learning rate change over time?

For years, cosine decay has been the default answer. But many recent LLM projects have quietly adopted an alternative: the multi-step learning rate scheduler.