The generation side is shipping fast (TileGym, AutoKernel, KernelEvolve). The validation-side surface for “what the kernel actually did at runtime” has not kept pace.
TL;DR
In the past nine months, three significant releases have landed for auto-generation of CUDA kernels: NVIDIA TileGym, RightNow AutoKernel, and Meta’s KernelEvolve. Each ships training infrastructure for kernel generation. Validation infrastructure (what the generated kernel actually did at runtime, on a real workload, in a production-shaped environment) has not kept the same pace. eBPF traces are the ground-truth layer that closes the gap.
What “validation” means at the kernel level
Two distinct validation surfaces:









