DeepSeek unveils DSpark for 60% to 85% faster inference optimization

DeepSeek released DSpark on June 27, a speculative decoding framework that accelerates per-user generation speeds by 60% to 85% on its DeepSeek-V4 Flash model and 57% to 78% on the Pro variant.

DSpark isn’t a new model. It’s an engineering optimization layered on top of existing DeepSeek-V4 checkpoints. The company didn’t need to train a bigger model to get meaningfully better performance.

How DSpark actually works

DSpark uses what DeepSeek calls a “semi-parallel” method that combines high-throughput parallel generation with adaptive verification. Instead of generating and checking one token at a time, DSpark speculatively generates multiple candidate tokens simultaneously, then selectively verifies only the promising guesses.

The throughput gains are even more dramatic than the per-user speed numbers suggest. Depending on concurrency levels, DeepSeek reports throughput improvements ranging from 51% to 400%.

DeepSeek released DSpark on June 27, a speculative decoding framework that accelerates per-user generation speeds by 60% to 85% on its DeepSeek-V4 Flash model and 57% to 78% on the Pro variant.

How DSpark actually works

The throughput gains are even more dramatic than the per-user speed numbers suggest. Depending on concurrency levels, DeepSeek reports throughput improvements ranging from 51% to 400%.

DeepSeek unveils DSpark for 60% to 85% faster inference optimization

DeepSeek unveils DSpark for 60% to 85% faster inference optimization

Other newsrooms on this story

Related reading

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates…

DeepSeek's DSpark upgrade is here: What does it do?

Faster AI, lower costs: DSpark eases bottlenecks and chip strain, says DeepSeek

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up…

DeepSeek DSpark: AI Più Veloce E Meno Costosa Senza Nuovo Modello

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints |…

Other newsrooms on this story

Related reading

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates…

DeepSeek's DSpark upgrade is here: What does it do?

Faster AI, lower costs: DSpark eases bottlenecks and chip strain, says DeepSeek

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up…

DeepSeek DSpark: AI Più Veloce E Meno Costosa Senza Nuovo Modello

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints |…