Continuous batching for GRPO, now in TRL

Back to Articles

Just one flag Some numbers When to reach for it One more thing Getting it Still moving Resources

Continuous batching has been a continued effort in transformers for a few months now. The aim is a fast, memory-aware generation path that lives inside the library itself, and it has been documented as it grew, first the core mechanism, then the asynchronous version (h/t @ror 🐐).

Now those efforts have gone beyond generation and into training. GRPO in TRL can use continuous batching for its rollouts.

Online RL is generation-heavy: producing the rollouts is usually the most expensive part of the loop, so the generation path is where the speed lives. Until now TRL gave you two options: the default generate(), simple and in-process but wasteful when you ask for many completions, or vLLM, very fast but a separate inference engine to bring in and manage (as its own server, or colocated on the training GPUs). Continuous batching fills the gap in the middle: an in-process path that does not waste compute and memory at high N, using transformers directly, with no vLLM dependency and no weight syncing between two copies of the model.

Back to Articles

Just one flag Some numbers When to reach for it One more thing Getting it Still moving Resources

Now those efforts have gone beyond generation and into training. GRPO in TRL can use continuous batching for its rollouts.

Continuous batching for GRPO, now in TRL

Continuous batching for GRPO, now in TRL

Other newsrooms on this story

Related reading

Unlocking asynchronicity in continuous batching

Dynamic batching: a how-to guide

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs

Cx Dev Log — 2026-05-06

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the…

Related reading

Unlocking asynchronicity in continuous batching

Dynamic batching: a how-to guide

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs

Cx Dev Log — 2026-05-06

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the…

Other newsrooms on this story