Scaling AI Inference Across Multiple GPUs Using NVIDIA TensorRT with Multi-Device Inference Support | NVIDIA Technical Blog

Generative AI workloads are rapidly outgrowing the memory and compute budget of single GPUs. For inference developers building media generation pipelines, the challenge is scaling across multiple devices without sacrificing the critical optimizations—like kernel fusions, memory planning, and quantization—that NVIDIA TensorRT delivers for production deployments.

Multi-device inference support, a new feature introduced in TensorRT 11.0, brings native high-performance multi-GPU inference to the TensorRT runtime, enabling multi-device production deployments targeting edge devices.

Combining the multi-device inference support in TensorRT with Torch-TensorRT, developers can convert and deploy massive PyTorch models out-of-framework, shattering single-device memory and compute limits.

Download TensorRT 11.0 with multi-device inference support from NVIDIA Developer Portal to unlock native, high-performance multi-device acceleration for your models.

NVIDIA NCCL: The transport layer for distributed inference

Download TensorRT 11.0 with multi-device inference support from NVIDIA Developer Portal to unlock native, high-performance multi-device acceleration for your models.

NVIDIA NCCL: The transport layer for distributed inference

Scaling AI Inference Across Multiple GPUs Using NVIDIA TensorRT with Multi-Device Inference Support | NVIDIA Technical Blog

Scaling AI Inference Across Multiple GPUs Using NVIDIA TensorRT with Multi-Device Inference Support | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

Adaptive Inference in NVIDIA TensorRT for RTX Enables Automatic Optimization |…

NVIDIA Technical Blog

NVIDIA Unlocks AI Compute at Scale, Inviting Partners to Power the AI…

Solving the Decode Bottleneck: Why Agentic Inference Needs Hybrid Hardware

Optimizing inference speed and costs: Lessons learned from large-scale…

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer…

Other newsrooms on this story

Related reading

Adaptive Inference in NVIDIA TensorRT for RTX Enables Automatic Optimization |…

NVIDIA Technical Blog

NVIDIA Unlocks AI Compute at Scale, Inviting Partners to Power the AI…

Solving the Decode Bottleneck: Why Agentic Inference Needs Hybrid Hardware

Optimizing inference speed and costs: Lessons learned from large-scale…

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer…