The NVIDIA CUDA Core Compute Libraries (CCCL) provides delightful and efficient abstractions for CUDA developers in C++ and Python. It features:

Parallel algorithms – Host-launched algorithms including sort, scan and reduce that remove the need to write custom kernels for common operations

Cooperative algorithms – Device-side algorithms such as block-wide or warp-wide reductions or scans that simplify custom kernel development

Language idiomatic CUDA abstractions – Fundamental abstractions for CUDA-specific operations including memory allocation, resource management, and hardware features

This post introduces a new group of functionality in CCCL that provides modernized C++ abstractions for fundamental CUDA programming model concepts that make CUDA C++ development safer and more convenient.