DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached.

The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving.

TL;DR

DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay.

A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy.

TL;DR

DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay.

A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy.

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

Other newsrooms on this story

Related reading

DeepSeek unveils DSpark for 60% to 85% faster inference optimization

DeepSeek's DSpark upgrade is here: What does it do?

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up…

DeepSeek's DSpark Brings Speculative Decoding Back Into the Spotlight — Here's…

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints |…

Faster AI, lower costs: DSpark eases bottlenecks and chip strain, says DeepSeek

Related reading

DeepSeek unveils DSpark for 60% to 85% faster inference optimization

DeepSeek's DSpark upgrade is here: What does it do?

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up…

DeepSeek's DSpark Brings Speculative Decoding Back Into the Spotlight — Here's…

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints |…

Faster AI, lower costs: DSpark eases bottlenecks and chip strain, says DeepSeek

Other newsrooms on this story