DeepSeek released DSpark on June 27, a speculative decoding framework that accelerates per-user generation speeds by 60% to 85% on its DeepSeek-V4 Flash model and 57% to 78% on the Pro variant.
DSpark isn’t a new model. It’s an engineering optimization layered on top of existing DeepSeek-V4 checkpoints. The company didn’t need to train a bigger model to get meaningfully better performance.
How DSpark actually works
DSpark uses what DeepSeek calls a “semi-parallel” method that combines high-throughput parallel generation with adaptive verification. Instead of generating and checking one token at a time, DSpark speculatively generates multiple candidate tokens simultaneously, then selectively verifies only the promising guesses.
The throughput gains are even more dramatic than the per-user speed numbers suggest. Depending on concurrency levels, DeepSeek reports throughput improvements ranging from 51% to 400%.










