The First Disaggregated Inference Demo for AI Agents Is Live

SambaNova demonstrates how GPUs and RDUs work together to deliver premium inference for agent workloads using the right chip for the right workload.

At COMPUTEX, SambaNova demonstrated what the next era of AI inference looks like: Premium inference for AI agents powered by GPUs and RDUs, running live in the newly-announced VC2 data center for the first time.

Using Nvidia’s B200 GPU for prefill and SambaNova’s SN40 RDU for decode, the inference speed generated is 2X the speed of B200-only configurations, as verified by Artificial Intelligence.

This is running today out of Vector Core Compute's (VC2) data center, with Together.AI as the first commercial customer to use the inference capabilities from VC2.

This is a new operating model for inference providers. Coding agents are moving from novelty to daily developer workflow. OpenAI is talking about long-horizon Codex runs that can plan, edit, test, repair, and keep going for hours. Anthropic and other frontier AI companies are seeing the same demand curve: Developers want agents that can take on bigger chunks of real work, faster.

SambaNova demonstrates how GPUs and RDUs work together to deliver premium inference for agent workloads using the right chip for the right workload.

Using Nvidia’s B200 GPU for prefill and SambaNova’s SN40 RDU for decode, the inference speed generated is 2X the speed of B200-only configurations, as verified by Artificial Intelligence.

This is running today out of Vector Core Compute's (VC2) data center, with Together.AI as the first commercial customer to use the inference capabilities from VC2.

The First Disaggregated Inference Demo for AI Agents Is Live

The First Disaggregated Inference Demo for AI Agents Is Live

Other newsrooms on this story

Related reading

Intel-backed AI chip startup SambaNova breathes new life into aging Nvidia GPUs…

Inference Speed or Throughput? With RDUs, You Don't Have to Choose - SambaNova

Intel and SambaNova Advance Agentic AI with Xeon 6

SN50 Runs the Fastest MiniMax Speeds in the World

Introducing the SN50 RDU: Purpose-Built for Agentic Inference

Solving the Decode Bottleneck: Why Agentic Inference Needs Hybrid Hardware

Other newsrooms on this story

Related reading

Intel-backed AI chip startup SambaNova breathes new life into aging Nvidia GPUs…

Inference Speed or Throughput? With RDUs, You Don't Have to Choose - SambaNova

Intel and SambaNova Advance Agentic AI with Xeon 6

SN50 Runs the Fastest MiniMax Speeds in the World

Introducing the SN50 RDU: Purpose-Built for Agentic Inference

Solving the Decode Bottleneck: Why Agentic Inference Needs Hybrid Hardware