Computer-aided engineering (CAE) is shifting from human-driven workflows toward AI-driven ones, including physics foundation models that generalize across geometries and operating conditions. Unlike LLMs, these models depend on large volumes of high-fidelity, physics-compliant data.

Recent scaling-law work on computational fluid dynamics (CFD) surrogates indicates that simulation-generated training data is often the limiting cost in practice. This pushes requirements onto the simulator, which must be GPU-native, fast, and able to plug directly into ML workflows.

NVIDIA Warp is a framework for accelerated simulation, data generation, and spatial computing that bridges CUDA and Python. Warp enables developers to write high-performance kernels as regular Python functions that are JIT-compiled into efficient code for execution on the GPU. Unlike the tensor-based frameworks, in which developers express computation as operations on entire N-dimensional arrays, developers author flexible kernels in the Warp framework that execute simultaneously across all elements of a computational grid.

Simulation kernels are often expressed on computational grids and rely on data-dependent control flow like conditionals, early-outs, and selective updates that vary per element. In tensor frameworks, these patterns require composing Boolean masks that quickly become unwieldy and can waste computation on irrelevant elements. In a Warp kernel, each thread can branch, skip, or exit independently, expressing this logic naturally without masking workarounds.