Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl | NVIDIA Technical Blog

NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and matrix multiply-accumulate—rather than manually coordinating threads, warps, and shared memory.

cuTile.jl brings the same tile-based approach to the dynamic programming language Julia. Users can write custom GPU kernels without dropping down to NVIDIA CUDA C++. Custom kernels are often essential in Julia’s scientific computing ecosystem— spanning differential equations, probabilistic programming, and physics simulations.

cuTile Python has a growing library of optimized kernels for GPU acceleration. The ability to translate those kernels to cuTile.jl provides the Julia ecosystem with immediate access to battle-tested implementations, instead of rewriting each one from scratch.

This post covers cross-domain-specific language (DSL) GPU kernel translation, from porting cuTile Python kernels to cuTile.jl (Julia). It shows how to:

Translate GPU kernels between cuTile Python and cuTile.jl: Walk through a complete matrix multiplication example side-by-side.

This post covers cross-domain-specific language (DSL) GPU kernel translation, from porting cuTile Python kernels to cuTile.jl (Julia). It shows how to:

Translate GPU kernels between cuTile Python and cuTile.jl: Walk through a complete matrix multiplication example side-by-side.

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl | NVIDIA Technical Blog

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition,…

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA…

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

96% of cuBLAS, no `unsafe`: what cuTile Rust proves

Category: Simulation / Modeling / Design | NVIDIA Technical Blog

Custom Kernels for All from Codex and Claude

Other newsrooms on this story

Related reading

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition,…

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA…

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

96% of cuBLAS, no `unsafe`: what cuTile Rust proves

Category: Simulation / Modeling / Design | NVIDIA Technical Blog

Custom Kernels for All from Codex and Claude