Microsoft’s Azure cloud platform just posted the fastest AI training results at the largest reported scale, powered by a deepened collaboration with Nvidia. The achievement, announced on March 18, 2025, centers on record-setting performance in the MLPerf Training v4.1 benchmarks, the widely recognized independent standard for measuring machine learning hardware performance.

The configuration behind the results: 512 Nvidia H200 GPUs working in concert, delivering a 28% performance improvement over previous setups built on H100 GPUs.

What the benchmarks actually show

In previous 2023 benchmarks, Azure showed it could train a GPT-3 model with 175 billion parameters on 10,752 H100 GPUs in approximately 4 minutes. The new H200-based configuration builds on that foundation with meaningfully better per-GPU performance, reducing the total hardware needed to hit comparable training speeds.

The full stack behind these results goes beyond just swapping in newer GPUs. Microsoft cited integrated innovations across hardware, networking, and software. The setup leverages Nvidia Quantum InfiniBand networking, which handles the massive data transfer demands between GPUs during distributed training. It also incorporates Nvidia’s microservices alongside Azure’s own AI services, including its AI Foundry platform.