Mid-training is essential for LLM reasoning, IBM study shows

For years, the basic recipe for building a capable large language model was straightforward: train a model on mountains of text, then teach it to respond in a helpful, humanlike way through reinforcement learning. At some point, an intermediate training phase was added in, with a heavy focus on math, code, and science, and the reasoning capabilities of LLMs seemed to take a giant leap.This stage is now referred to as mid-training. Today it’s a routine, if mysterious, step in training today’s reasoning models to do things like rooting out mistakes in complex code bases, lengthy contracts, or financial statements. A new IBM study explains why mid-training so effective, in the first large-scale, systematic look at mid-training in open-source LLMs.Through more than 500 controlled experiments, IBM researchers found that mid-training boosted overall reasoning capabilities in models of varying sizes and architectures by 3 to 4 times, while preserving knowledge gained during pre-training. Models that skipped this extra step and trained on the same math and science knowledge via reinforcement learning (RL), during post-training, only saw limited improvement.“Mid-training and reinforcement learning are not interchangeable stages,” said the study’s lead author, Bharat Runwal, an IBM researcher who works on the team behind IBM’s Granite family of models. “They operate through fundamentally different mechanisms, and each does something the other cannot.”Runwal and his colleagues compared open-source base models drawn from four model families — IBM Granite, Mistral, and Meta’s LLaMA and NVIDIA’s Nemotron-H models — ranging from 3 billion to 24 billion parameters in size. They also tested a traditional transformer architecture and a hybrid design combining a transformer’s attention mechanism with newer recurrent-style processing. Across The benchmarks included the notoriously difficult Google-Proof Question & Answer (GPQA)-Diamond and the

Mid-training is essential for LLM reasoning, IBM study shows

Mid-training is essential for LLM reasoning, IBM study shows

Other newsrooms on this story

Related reading

The Three Phases of Post-Training: How LLMs Learn to Provide Sensible Responses

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone

Teaching the model: Designing LLM feedback loops that get smarter over time

IEEE Rolls Out Large Language Models Virtual Training Course

A $1,500 foundation model that rivals larger LLMs

The paradox of LLM self-distillation: Faster reasoning, weaker generalization -…

Other newsrooms on this story

Related reading

The Three Phases of Post-Training: How LLMs Learn to Provide Sensible Responses

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone

Teaching the model: Designing LLM feedback loops that get smarter over time

IEEE Rolls Out Large Language Models Virtual Training Course

A $1,500 foundation model that rivals larger LLMs

The paradox of LLM self-distillation: Faster reasoning, weaker generalization -…