Hewlett Packard Enterprise announced updates to its Private Cloud AI platform on March 16, co-engineered with Nvidia, that deliver up to a 20% improvement in token throughput for AI inference tasks. New network expansion racks will allow the platform to scale to 128 GPUs, with availability slated for July 2026.

What’s actually changing

Token throughput is how many chunks of text (or other data) an AI model can process per second. A 20% jump means enterprises running generative AI or agentic AI workloads get meaningfully faster responses without swapping out hardware.

The platform now supports Nvidia RTX PRO 6000 Blackwell Server Edition GPUs, specifically designed for enterprise data center deployments rather than the workstation or consumer market.

Scaling to 128 GPUs through the new expansion racks allows enterprises to run bigger models or serve more concurrent users. For organizations that started small with Private Cloud AI and need to grow, this removes what was previously a hard constraint.