The generation side is shipping fast (TileGym, AutoKernel, KernelEvolve). The validation-side surface for “what the kernel actually did at runtime” has not kept pace.

TL;DR

In the past nine months, three significant releases have landed for auto-generation of CUDA kernels: NVIDIA TileGym, RightNow AutoKernel, and Meta’s KernelEvolve. Each ships training infrastructure for kernel generation. Validation infrastructure (what the generated kernel actually did at runtime, on a real workload, in a production-shaped environment) has not kept the same pace. eBPF traces are the ground-truth layer that closes the gap.

What “validation” means at the kernel level

Two distinct validation surfaces: