SambaNova demonstrates how GPUs and RDUs work together to deliver premium inference for agent workloads using the right chip for the right workload.
At COMPUTEX, SambaNova demonstrated what the next era of AI inference looks like: Premium inference for AI agents powered by GPUs and RDUs, running live in the newly-announced VC2 data center for the first time.
Using Nvidia’s B200 GPU for prefill and SambaNova’s SN40 RDU for decode, the inference speed generated is 2X the speed of B200-only configurations, as verified by Artificial Intelligence.
This is running today out of Vector Core Compute's (VC2) data center, with Together.AI as the first commercial customer to use the inference capabilities from VC2.
This is a new operating model for inference providers. Coding agents are moving from novelty to daily developer workflow. OpenAI is talking about long-horizon Codex runs that can plan, edit, test, repair, and keep going for hours. Anthropic and other frontier AI companies are seeing the same demand curve: Developers want agents that can take on bigger chunks of real work, faster.













