Storia in 1 fonti

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to softmax exponentials.

Raccontata da

together.ai

Timeline cronologica

mercoledì 27 maggio 2026·together.ai
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared…
mercoledì 27 maggio 2026·together.ai
ThunderKittens Now Optimized for NVIDIA Blackwell GPUs
At Together AI, we have been investing in the ThunderKittens framework - a software library that we developed in collaboration with researchers at Stanford to make it easier to…

Timeline cronologica

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs

Timeline cronologica

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

ThunderKittens Now Optimized for NVIDIA Blackwell GPUs