DeepSeek-V4: a million-token context that agents can actually use

Back to Articles

The KV cache problem for agents Hybrid attention: CSA and HCA What changes for agents Interleaved thinking across tool calls Tool-call schema with dedicated tokens DSec: a sandbox built for RL rollouts Agent benchmark results Using the models DeepSeek released V4 today. Two MoE checkpoints are on the Hub: DeepSeek-V4-Pro at 1.6T total parameters with 49B active, and DeepSeek-V4-Flash at 284B total with 13B active. Both have a 1M-token context window. The benchmark numbers are competitive, but not SOTA. It doesn't matter. The real innovation is how DeepSeek v4 is designed for efficient large context length support, and hence as one of the best candidates for agentic tasks.

Focusing on long running agentic workloads. Running a frontier open model as an agent today breaks in predictable ways. The model stops. You reprompt. The trace blows past the context budget, or the KV cache fills the GPU, or tool-call round trips degrade halfway through a long task. V4 is built to fix these known failures, and point the way for the community to follow.

This post covers three things: what the architecture does differently to make long-context inference cheap, the agent-specific post-training decisions that compound on top of it, and some takeaways from the paper that help reason about these changes.

Back to Articles

DeepSeek-V4: a million-token context that agents can actually use

DeepSeek-V4: a million-token context that agents can actually use

Other newsrooms on this story

Related reading

DeepSeek-V4 Pro now available on Together AI

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 preview now available with open-source access · TechNode

DeepSeek's new open models give everyone a million-word memory by default

The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of…

As agentic AI pushes rivals to raise prices and cap usage, Deepseek V4 is a…

Other newsrooms on this story

Related reading

DeepSeek-V4 Pro now available on Together AI

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 preview now available with open-source access · TechNode

DeepSeek's new open models give everyone a million-word memory by default

The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of…

As agentic AI pushes rivals to raise prices and cap usage, Deepseek V4 is a…