DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient million-token context inference.
DeepSeek-V4-Pro is the largest model in the family, with 1.6T total parameters and 49B active parameters. DeepSeek-V4-Flash is a smaller 284B-parameter model with 13B active parameters, designed for higher-speed, higher-efficiency workloads. Both models support up to a 1M-token context window, opening new possibilities for long-context coding, document analysis, retrieval, and agentic AI workflows.
SpecificationDeepSeek-V4-ProDeepSeek-V4-FlashModalityTextTextTotal parameters1.6T284BActive parameters49B13BContext length1M tokens1M tokensMax output lengthUp to 384K tokens through DeepSeek API docsUp to 384K tokens through DeepSeek API docsPrimary use casesAdvanced reasoning, coding, long-context agentsHigh-speed efficiency, chat, routing, summarizationLicenseMITMITTable 1. Specifications for the DeepSeek V4 model family
Architectural innovations for long-context inference
The V4 family builds on the DeepSeek MoE architecture, with an increased focus on optimizing the attention component of the transformer architecture. These innovations are designed to achieve a 73% reduction in per-token inference FLOPs and a 90% reduction in KV cache memory burden compared with DeepSeek-V3.2.












