Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past year, token consumption has grown multifold and now exceeds 10 quadrillion tokens per year. And while the majority of tokens have been generated from humans interacting with AI, the new era is one in which most tokens will be generated from AI interacting with AI.

Modern agentic systems plan tasks, invoke tools, execute code, retrieve data, and coordinate across continuous multistep workflows with numerous AI agents. These interactions generate large volumes of reasoning tokens, expand KV cache, and require CPU-based sandboxed environments to test and validate results generated by accelerated computing systems. This places low latency, high throughput demands across GPUs, CPUs, scale-up domains, scale-out networks, and storage.

Delivering useful intelligence for these modern agentic systems requires fleets of purpose-built rack-scale systems that function together as one coherent AI supercomputer. This post introduces the NVIDIA Vera Rubin POD, a set of five specialized rack-scale systems built on the third-generation NVIDIA MGX rack architecture for the era of agentic AI.