DeepSeek/deepseek-v4-pro
DeepSeek's open-weight flagship 1.6T MoE model with hybrid sparse attention, 1M-token context, competition-grade coding (LiveCodeBench 93.5), and ultra-low pricing — built for code, reasoning, and long-horizon agents.
More from DeepSeek
README
DeepSeek/deepseek-v4-pro
Supported Functionality
| Item | Specification |
|---|---|
| Input | Text |
| Output | Text |
| Context | 1,000,000 tokens (1M) |
| Max Output | 384,000 tokens |
| Vision | ✗ Not Supported |
| Function Calling | ✓ Supported (OpenAI / Anthropic-compatible API) |
Description
DeepSeek V4 Pro is DeepSeek-AI's open-weight flagship model, released on April 24, 2026 under the MIT License. It is a Mixture-of-Experts (MoE) model with 1.6 trillion total parameters and 49 billion activated per token (61 layers, 384 routed experts plus 1 shared expert, with 6 experts selected per token), pre-trained on 33T tokens. The model uses FP4 + FP8 mixed-precision quantization-aware training (FP4 for MoE experts, FP8 elsewhere) and supports Thinking/Non-Thinking dual modes with three reasoning effort levels (Think Low / High / Max).
V4 Pro's defining breakthrough is efficient million-token-context inference — via a Hybrid Attention design (Compressed Sparse Attention + Heavily Compressed Attention) and Manifold-Constrained Hyper-Connections (mHC), it consumes just 27% of single-token FLOPs and 10% of KV cache compared to V3.2 at 1M context, and is trained with the Muon optimizer. V4-Pro-Max scores 93.5 on LiveCodeBench (#1 across open and closed models), a Codeforces rating of 3,206 (ahead of GPT-5.4's 3,168), and 80.6% on SWE-Bench Verified (within 0.2 of Claude Opus 4.6) — all at roughly 1/7 the price of frontier closed models.
Key Capabilities
- Competitive Coding: LiveCodeBench Pass@1 93.5 and Codeforces rating 3,206 — the first model placing in the top 23 of all human competitors.
- Real-World Software Engineering: SWE-Bench Verified 80.6% (within 0.2 of Claude Opus 4.6); Terminal-Bench 2.0 67.9% (ahead of Claude's 65.4%).
- Million-Token Context Reasoning: 1M-token input plus 384K max output, enabling whole-codebase, long-document, and multi-file analysis in a single pass.
- Variable Thinking Effort: Think Low / High / Max modes let developers trade latency for accuracy per call.
- Math & Scientific Reasoning: GPQA Diamond 90.1, IMOAnswerBench 89.8 (14 points ahead of Claude), HMMT 2026 95.2 — strong on hard STEM problems.
- Agentic Tool Use: Toolathlon 51.8 and MCPAtlas Public 73.6 lead most closed-source peers on tool orchestration and MCP-protocol workflows.
- Frontier Price-Performance: $1.74/M input + $3.48/M output (cache hits as low as $0.145/M) — roughly 1/7 the cost of GPT-5.4 or Claude Opus.
Technical Strengths
| Feature | Benefit |
|---|---|
| Hybrid Attention (CSA + HCA) | Cuts long-context FLOPs to 27% and KV cache to 10% of V3.2, making 1M-token inference economically practical in production |
| 1.6T MoE / 49B Active | Trillion-scale knowledge capacity with inference cost close to a mid-sized dense model |
| FP4 + FP8 Quantization-Aware Training | Low-precision baked in during training avoids post-hoc quantization loss and reduces memory/compute footprint |
| Muon Optimizer + mHC | Faster convergence and stable training at 1T+ scale; mHC preserves signal propagation through deep networks |
| MIT License | Weights freely downloadable on Hugging Face, commercial use and fine-tuning allowed — no vendor lock-in |
| OpenAI/Anthropic-Compatible API | Drop-in integration for Cursor, Claude Code, Cline, and other tooling with zero code changes |
Capability Ratings
| Dimension | Rating | Notes |
|---|---|---|
| Reasoning | Excellent | GPQA Diamond 90.1 and IMOAnswerBench 89.8 lead open-source; HLE 37.7 trails top closed models slightly |
| Coding | Top-tier | #1 on LiveCodeBench (93.5) and Codeforces (3,206); SWE-Bench Verified statistically tied with Claude Opus |
| Creative Writing | Strong | Solid prose quality, though training and tuning prioritize coding and reasoning over creative output |
| Multimodal | Moderate | Current release is text-only — no native vision or video input |
| Response Speed | Moderate | Hybrid attention keeps long-context inference efficient, but Think Max mode is verbose |
| Context Window | Huge | 1M tokens — among the largest open-source context windows, with sparse attention keeping it affordable |
Use Cases
- Codebase Analysis & Refactoring: Load an entire repository into 1M-token context and run cross-file refactors with frontier-grade coding ability.
- AI Coding Assistants & IDE Integration: Drop-in replacement backend for Cursor, Claude Code, Cline, and Continue at a fraction of closed-model cost.
- Competitive Programming & Algorithms: A 3,206 Codeforces rating makes V4 Pro a serious tutor for ACM training, contest prep, and interview practice.
- Long-Document Legal & Financial Analysis: Million-token context handles entire books, prospectuses, or multi-contract bundles in a single inference.
- Scientific & Mathematical Research: Strong IMO, HMMT, and GPQA performance makes it useful for math, physics, and biology research assistance.
- Self-Hosted Enterprise Deployment: MIT license plus open weights enable on-prem rollouts on 8×H100 or 4×H200 clusters for compliance-sensitive workloads.
- High-Volume API & Agentic Production: Aggressive token pricing with automatic cache discounts is ideal for long-running multi-step agent workloads.
Pricing
| Token Type | LinkAI Price | Official Price |
|---|---|---|
| input | $1.392000 / 1M tokens | $1.740000 / 1M tokens |
| output | $2.784000 / 1M tokens | $3.480000 / 1M tokens |
| cache_read | $0.011600 / 1M tokens | $0.014500 / 1M tokens |
| cache_write | $0.000000 / 1M tokens | $0.000000 / 1M tokens |