Where tensor-parallel inference hits the NVLink wall

2026-05-31 · GPU / distributed systems

Tensor parallelism splits each layer across GPUs, so every forward pass pays for an

all-reduce over the network fabric. On a single node that fabric is NVLink/NVSwitch — and

how close you get to its theoretical budget decides whether TP helps or hurts. This post