Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight | NVIDIA Technical Blog

In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU scheduling. In the previous post, Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6, this was described as the data-to-tensor gap—a performance mismatch between AI pipeline stages.

The SMPTE VC-6 (ST 2117-1) codec addresses this gap through a hierarchical, tile-based architecture. Images are encoded as progressively refinable Levels of Quality (LoQs), each adding incremental detail. This enables selective retrieval and decoding of only the required resolution, region of interest, or color plane, with random access to independently decodable frames. Pipelines can retrieve and decode only what the model needs.

However, efficient single-image execution does not automatically translate to efficient scaling. As batch sizes grow, the bottleneck shifts from single-image kernel efficiency to workload orchestration, launch cadence, and GPU occupancy.

This post focuses on the architectural changes required to scale VC-6 decoding for batched inference and training workloads. As NVIDIA Nsight Systems and NVIDIA Nsight Compute allow developers to identify system- and kernel-level constraints, they were leveraged to redesign the VC-6 CUDA implementation for batch throughput. The result is up to ~85% lower per-image decode time compared to the previous implementation, with submillisecond decode for LoQ-0 (~4K) in batch and ~0.2 ms for lower LoQs, with identical output quality. This significantly improves pipeline efficiency for production vision AI workloads.

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight | NVIDIA Technical Blog

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents | NVIDIA…

Category: Networking / Communications | NVIDIA Technical Blog

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA…

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai | NVIDIA…

Accelerating BEV Pooling on NVIDIA GPUs for Physical AI Applications | NVIDIA…

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents | NVIDIA…

Category: Networking / Communications | NVIDIA Technical Blog

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA…

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai | NVIDIA…

Accelerating BEV Pooling on NVIDIA GPUs for Physical AI Applications | NVIDIA…

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog