Prime Intellect has released prime-rl version 0.6.0. The framework targets reinforcement learning on trillion-parameter Mixture-of-Experts (MoE) models. It focuses on heavy agentic workloads, like long-horizon software-engineering tasks.
The research team trained GLM-5 on SWE tasks at up to 131k sequence length. Step times stayed under five minutes. The batch size was 256 rollouts. The run used only 28 H200 nodes.
TL;DR
prime-rl 0.6.0 trains trillion-parameter MoE models on agentic RL workloads.
GLM-5 trained on SWE at 131k sequence length, sub-5-minute steps, 28 H200 nodes.








