Xiaomi MiMo-V2.5-Pro-UltraSpeed decodes past 1000 tokens per second on commodity GPUs using FP4 quantization and DFlash speculative decoding.

Xiaomi MiMo-V2.5-Pro-UltraSpeed decodes past 1000 tokens per second on commodity GPUs using FP4 quantization and DFlash speculative decoding.

MiMo-V2.5-Pro-UltraSpeed from Xiaomi blows past the speed threshold custom silicon companies spent years building toward—on regular GPUs.