Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA Technical Blog

Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based abstractions.

NVIDIA CUDA Tile, launched with NVIDIA CUDA 13.1, introduced tile-based programming for GPUs. Designed with a top-level language layer and another intermediate layer that any high-level programming language can target, CUDA Tile automatically makes use of the advanced capabilities of NVIDIA hardware—including tensor cores, shared memory, and tensor memory accelerators—without requiring the application to target them directly.

Python was the first language supported for tile-based GPU applications. The newly released CUDA 13.3 adds support for writing tile kernels in C++, enabling developers to build highly optimized GPU kernels.

What is CUDA Tile C++?

CUDA Tile C++ is an expression of the CUDA Tile programming model in C++, built on top of the CUDA Tile IR specification. It enables developers to write tile kernels in C++ and express GPU kernels using a tile-based model, rather than or in addition to a single instruction, multiple threads (SIMT) model.

Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based abstractions.

What is CUDA Tile C++?

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA Technical Blog

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl |…

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition,…

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features |…

Category: Data Science | NVIDIA Technical Blog

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels,…

Other newsrooms on this story

Related reading

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl |…

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition,…

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features |…

Category: Data Science | NVIDIA Technical Blog

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels,…