Storia in 1 fonti

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

FP8 and INT8 KV caches cut attention state ~50%, but they shift the target model's logit distribution — and that can quietly halve the gains from speculative decoding. vLLM v0.22.1 tour.

Raccontata da

dev.to

Timeline cronologica

sabato 6 giugno 2026·dev.to
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
FP8 and INT8 KV caches cut attention state ~50%, but they shift the target model's logit distribution — and that can quietly halve the gains from speculative decoding. vLLM…