It wasn’t that long ago that you operated a data center and you could plan for three, four, or many more IT refresh cycles without any major power or cooling updates. We are now in a new era of accelerated compute AI where power densities are growing fast from an average of around 10kW a per rack for CPU based cloud servers a few years ago to over 140kW per rack for the new NVIDIA NVL 72 GB300 (72 GPUs) for AI.

However, these systems represent the leading edge of accelerated compute AI performance and cost, which are well suited for the most advanced model training or fully autonomous agentic AI inference. They are also liquid cooled.

We are at a point in time where enterprise and colocation data center operators are deploying working AI inference models for product design and development, improved customer experience, and increased employee productivity and innovation.

These AI models can even operate high compute intensive multiple modalities (image to text, text to video, etic). Companies deploying inference models ideally want them to be efficient, so the models should be compressed and tuned to 'fit' the application. This allows rightsizing or optimizing the accelerated compute IT stacks to operate at a lower kW per rack.