CCCL Runtime: A Modern C++ Runtime for CUDA | NVIDIA Technical Blog

The NVIDIA CUDA Core Compute Libraries (CCCL) provides delightful and efficient abstractions for CUDA developers in C++ and Python. It features:

Parallel algorithms – Host-launched algorithms including sort, scan and reduce that remove the need to write custom kernels for common operations

Cooperative algorithms – Device-side algorithms such as block-wide or warp-wide reductions or scans that simplify custom kernel development

Language idiomatic CUDA abstractions – Fundamental abstractions for CUDA-specific operations including memory allocation, resource management, and hardware features

This post introduces a new group of functionality in CCCL that provides modernized C++ abstractions for fundamental CUDA programming model concepts that make CUDA C++ development safer and more convenient.

The NVIDIA CUDA Core Compute Libraries (CCCL) provides delightful and efficient abstractions for CUDA developers in C++ and Python. It features:

Parallel algorithms – Host-launched algorithms including sort, scan and reduce that remove the need to write custom kernels for common operations

Cooperative algorithms – Device-side algorithms such as block-wide or warp-wide reductions or scans that simplify custom kernel development

Language idiomatic CUDA abstractions – Fundamental abstractions for CUDA-specific operations including memory allocation, resource management, and hardware features

CCCL Runtime: A Modern C++ Runtime for CUDA | NVIDIA Technical Blog

CCCL Runtime: A Modern C++ Runtime for CUDA | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and…

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA…

Category: Simulation / Modeling / Design | NVIDIA Technical Blog

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels,…

Category: Data Science | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and…

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA…

Category: Simulation / Modeling / Design | NVIDIA Technical Blog

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++,…

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels,…

Category: Data Science | NVIDIA Technical Blog