DeepSeek releases DSpark speculative decoding, 60–85% faster V4 inference
What happened
DeepSeek on June 27 open-sourced DSpark — "Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation." The framework dramatically speeds up per-user generation on both DeepSeek-V4 Flash and Pro variants without requiring any retraining of the underlying model.
Context and impact
This is a serialization-of-inference breakthrough that lets V4 feel substantially faster on existing hardware — critical given US export controls toward China. It is also DeepSeek's first major technical release since its latest funding round, confirming continued strong open-source contributions out of China in core infrastructure.
Details
- Speedup: 60–85% on V4 Flash, 57–78% on V4 Pro vs. MTP-1 baseline
- Pairs parallel draft backbone with sequential head + confidence head + load-aware scheduler
- Open-source checkpoints and training code
- Paper at arXiv:2606.19348
- Top of Hacker News (771 points, 330 comments)
Open original source
MarkTechPost