A few days ago I finished benchmarking something I've been building - a cache-aware, stable, histogram-based sorting algorithm I'm calling BusSort. The results surprised even me.

At 100 million elements, it runs ~2x faster than Java's Dual-Pivot Quicksort on random data - while being stable. Dual-Pivot QS is not.

The Problem With Quicksort at Scale

Quicksort-based algorithms partition elements with random writes across the entire array. At large scales this causes cache thrashing - elements are being written to memory locations all over the place, constantly missing L1 and L2 cache.

The larger the array, the worse it gets.