0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

Training a 107-billion-parameter AI model is hard enough when you have a warehouse full of cutting-edge GPUs connected by ultra-fast networking. Doing it across decentralized clusters on a standard 1 Gbps network? That’s a fundamentally different engineering challenge. 0G Labs claims to have pulled it off.

The project, completed in July 2025 in partnership with China Mobile, represents the first successful decentralized training of an AI model exceeding 100 billion parameters. The research paper detailing the methodology was published on arXiv on June 26, 2025, under the code arXiv:2506.21263.

How DiLoCoX actually works

The standard approach, known as AllReduce, requires all nodes to constantly share gradient updates with each other. DiLoCoX instead lets clusters of NVIDIA A800 GPUs work semi-independently, synchronizing far less frequently.

The framework employs several technical innovations to make this possible. Pipeline parallelism breaks the model into stages that can be processed sequentially across devices. A dual optimizer policy uses different optimization strategies for local and global training steps. One-step-delay overlap allows computation to continue while synchronization happens in the background. And adaptive gradient compression squeezes down the data that needs to travel between clusters.

How DiLoCoX actually works

0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

Other newsrooms on this story

Related reading

This Half-Gigabyte AI Model Runs Local Agents on Your Phone - Decrypt

Prime Intellect builds decentralized training ground for AI models

NVIDIA, Telecom Leaders Build AI Grids to Optimize Inference on Distributed…

How Alibaba builds its most efficient AI model to date

5 New Digital Twin Products Developers Can Use to Build 6G Networks | NVIDIA…

Telecom giants launch AI token services

Other newsrooms on this story

Related reading

This Half-Gigabyte AI Model Runs Local Agents on Your Phone - Decrypt

Prime Intellect builds decentralized training ground for AI models

NVIDIA, Telecom Leaders Build AI Grids to Optimize Inference on Distributed…

How Alibaba builds its most efficient AI model to date

5 New Digital Twin Products Developers Can Use to Build 6G Networks | NVIDIA…

Telecom giants launch AI token services