DeepSeek

DeepSeek/deepseek-v4-pro

From $0.0116 / 1M tokens

DeepSeek's open-weight flagship 1.6T MoE model with hybrid sparse attention, 1M-token context, competition-grade coding (LiveCodeBench 93.5), and ultra-low pricing — built for code, reasoning, and long-horizon agents.

Chat

More from DeepSeek

README

DeepSeek/deepseek-v4-pro

Supported Functionality

ItemSpecification
InputText
OutputText
Context1,000,000 tokens (1M)
Max Output384,000 tokens
Vision✗ Not Supported
Function Calling✓ Supported (OpenAI / Anthropic-compatible API)

Description

DeepSeek V4 Pro is DeepSeek-AI's open-weight flagship model, released on April 24, 2026 under the MIT License. It is a Mixture-of-Experts (MoE) model with 1.6 trillion total parameters and 49 billion activated per token (61 layers, 384 routed experts plus 1 shared expert, with 6 experts selected per token), pre-trained on 33T tokens. The model uses FP4 + FP8 mixed-precision quantization-aware training (FP4 for MoE experts, FP8 elsewhere) and supports Thinking/Non-Thinking dual modes with three reasoning effort levels (Think Low / High / Max).

V4 Pro's defining breakthrough is efficient million-token-context inference — via a Hybrid Attention design (Compressed Sparse Attention + Heavily Compressed Attention) and Manifold-Constrained Hyper-Connections (mHC), it consumes just 27% of single-token FLOPs and 10% of KV cache compared to V3.2 at 1M context, and is trained with the Muon optimizer. V4-Pro-Max scores 93.5 on LiveCodeBench (#1 across open and closed models), a Codeforces rating of 3,206 (ahead of GPT-5.4's 3,168), and 80.6% on SWE-Bench Verified (within 0.2 of Claude Opus 4.6) — all at roughly 1/7 the price of frontier closed models.

Key Capabilities

  • Competitive Coding: LiveCodeBench Pass@1 93.5 and Codeforces rating 3,206 — the first model placing in the top 23 of all human competitors.
  • Real-World Software Engineering: SWE-Bench Verified 80.6% (within 0.2 of Claude Opus 4.6); Terminal-Bench 2.0 67.9% (ahead of Claude's 65.4%).
  • Million-Token Context Reasoning: 1M-token input plus 384K max output, enabling whole-codebase, long-document, and multi-file analysis in a single pass.
  • Variable Thinking Effort: Think Low / High / Max modes let developers trade latency for accuracy per call.
  • Math & Scientific Reasoning: GPQA Diamond 90.1, IMOAnswerBench 89.8 (14 points ahead of Claude), HMMT 2026 95.2 — strong on hard STEM problems.
  • Agentic Tool Use: Toolathlon 51.8 and MCPAtlas Public 73.6 lead most closed-source peers on tool orchestration and MCP-protocol workflows.
  • Frontier Price-Performance: $1.74/M input + $3.48/M output (cache hits as low as $0.145/M) — roughly 1/7 the cost of GPT-5.4 or Claude Opus.

Technical Strengths

FeatureBenefit
Hybrid Attention (CSA + HCA)Cuts long-context FLOPs to 27% and KV cache to 10% of V3.2, making 1M-token inference economically practical in production
1.6T MoE / 49B ActiveTrillion-scale knowledge capacity with inference cost close to a mid-sized dense model
FP4 + FP8 Quantization-Aware TrainingLow-precision baked in during training avoids post-hoc quantization loss and reduces memory/compute footprint
Muon Optimizer + mHCFaster convergence and stable training at 1T+ scale; mHC preserves signal propagation through deep networks
MIT LicenseWeights freely downloadable on Hugging Face, commercial use and fine-tuning allowed — no vendor lock-in
OpenAI/Anthropic-Compatible APIDrop-in integration for Cursor, Claude Code, Cline, and other tooling with zero code changes

Capability Ratings

DimensionRatingNotes
ReasoningExcellentGPQA Diamond 90.1 and IMOAnswerBench 89.8 lead open-source; HLE 37.7 trails top closed models slightly
CodingTop-tier#1 on LiveCodeBench (93.5) and Codeforces (3,206); SWE-Bench Verified statistically tied with Claude Opus
Creative WritingStrongSolid prose quality, though training and tuning prioritize coding and reasoning over creative output
MultimodalModerateCurrent release is text-only — no native vision or video input
Response SpeedModerateHybrid attention keeps long-context inference efficient, but Think Max mode is verbose
Context WindowHuge1M tokens — among the largest open-source context windows, with sparse attention keeping it affordable

Use Cases

  • Codebase Analysis & Refactoring: Load an entire repository into 1M-token context and run cross-file refactors with frontier-grade coding ability.
  • AI Coding Assistants & IDE Integration: Drop-in replacement backend for Cursor, Claude Code, Cline, and Continue at a fraction of closed-model cost.
  • Competitive Programming & Algorithms: A 3,206 Codeforces rating makes V4 Pro a serious tutor for ACM training, contest prep, and interview practice.
  • Long-Document Legal & Financial Analysis: Million-token context handles entire books, prospectuses, or multi-contract bundles in a single inference.
  • Scientific & Mathematical Research: Strong IMO, HMMT, and GPQA performance makes it useful for math, physics, and biology research assistance.
  • Self-Hosted Enterprise Deployment: MIT license plus open weights enable on-prem rollouts on 8×H100 or 4×H200 clusters for compliance-sensitive workloads.
  • High-Volume API & Agentic Production: Aggressive token pricing with automatic cache discounts is ideal for long-running multi-step agent workloads.

Pricing

Token TypeLinkAI PriceOfficial Price
input$1.392000 / 1M tokens$1.740000 / 1M tokens
output$2.784000 / 1M tokens$3.480000 / 1M tokens
cache_read$0.011600 / 1M tokens$0.014500 / 1M tokens
cache_write$0.000000 / 1M tokens$0.000000 / 1M tokens