MiMo-V2.5-Pro-UltraSpeed from Xiaomi blows past the speed threshold custom silicon companies spent years building toward—on regular GPUs.

Xiaomi MiMo-V2.5-Pro-UltraSpeed decodes past 1000 tokens per second on commodity GPUs using FP4 quantization and DFlash speculative decoding.

MiMo-V2.5-Pro-UltraSpeed from Xiaomi blows past the speed threshold custom silicon companies spent years building toward—on regular GPUs.

36氪获悉,6月8日晚,小米MiMo技术团队正式上线Xiaomi MiMo-V2.5-Pro-UltraSpeed模式。据了解,MiMo-V2.5-Pro-UltraSpeed通过对模型推理系统的全链路工程能力优化,在不降低模型能力前提下,首次把推理速度提升至1000 tokens/s,且无需定制芯片、只使用通用GPU即可达成。