deepseek-v4-pro - LinkModel

Supported Functionality

Item	Specification
Input	Text
Output	Text
Context	1,000,000 tokens (1M)
Max Output	384,000 tokens
Vision	✗ Not Supported
Function Calling	✓ Supported (OpenAI / Anthropic-compatible API)

Description

DeepSeek V4 Pro is DeepSeek-AI's open-weight flagship model, released on April 24, 2026 under the MIT License. It is a Mixture-of-Experts (MoE) model with 1.6 trillion total parameters and 49 billion activated per token (61 layers, 384 routed experts plus 1 shared expert, with 6 experts selected per token), pre-trained on 33T tokens. The model uses FP4 + FP8 mixed-precision quantization-aware training (FP4 for MoE experts, FP8 elsewhere) and supports Thinking/Non-Thinking dual modes with three reasoning effort levels (Think Low / High / Max).

V4 Pro's defining breakthrough is efficient million-token-context inference — via a Hybrid Attention design (Compressed Sparse Attention + Heavily Compressed Attention) and Manifold-Constrained Hyper-Connections (mHC), it consumes just 27% of single-token FLOPs and 10% of KV cache compared to V3.2 at 1M context, and is trained with the Muon optimizer. V4-Pro-Max scores 93.5 on LiveCodeBench (#1 across open and closed models), a Codeforces rating of 3,206 (ahead of GPT-5.4's 3,168), and 80.6% on SWE-Bench Verified (within 0.2 of Claude Opus 4.6) — all at roughly 1/7 the price of frontier closed models.

Key Capabilities

Competitive Coding: LiveCodeBench Pass@1 93.5 and Codeforces rating 3,206 — the first model placing in the top 23 of all human competitors.
Real-World Software Engineering: SWE-Bench Verified 80.6% (within 0.2 of Claude Opus 4.6); Terminal-Bench 2.0 67.9% (ahead of Claude's 65.4%).
Million-Token Context Reasoning: 1M-token input plus 384K max output, enabling whole-codebase, long-document, and multi-file analysis in a single pass.
Variable Thinking Effort: Think Low / High / Max modes let developers trade latency for accuracy per call.
Math & Scientific Reasoning: GPQA Diamond 90.1, IMOAnswerBench 89.8 (14 points ahead of Claude), HMMT 2026 95.2 — strong on hard STEM problems.
Agentic Tool Use: Toolathlon 51.8 and MCPAtlas Public 73.6 lead most closed-source peers on tool orchestration and MCP-protocol workflows.
Frontier Price-Performance: $1.74/M input + $3.48/M output (cache hits as low as $0.145/M) — roughly 1/7 the cost of GPT-5.4 or Claude Opus.

Technical Strengths

Feature	Benefit
Hybrid Attention (CSA + HCA)	Cuts long-context FLOPs to 27% and KV cache to 10% of V3.2, making 1M-token inference economically practical in production
1.6T MoE / 49B Active	Trillion-scale knowledge capacity with inference cost close to a mid-sized dense model
FP4 + FP8 Quantization-Aware Training	Low-precision baked in during training avoids post-hoc quantization loss and reduces memory/compute footprint
Muon Optimizer + mHC	Faster convergence and stable training at 1T+ scale; mHC preserves signal propagation through deep networks
MIT License	Weights freely downloadable on Hugging Face, commercial use and fine-tuning allowed — no vendor lock-in
OpenAI/Anthropic-Compatible API	Drop-in integration for Cursor, Claude Code, Cline, and other tooling with zero code changes

Capability Ratings

Dimension	Rating	Notes
Reasoning	Excellent	GPQA Diamond 90.1 and IMOAnswerBench 89.8 lead open-source; HLE 37.7 trails top closed models slightly
Coding	Top-tier	#1 on LiveCodeBench (93.5) and Codeforces (3,206); SWE-Bench Verified statistically tied with Claude Opus
Creative Writing	Strong	Solid prose quality, though training and tuning prioritize coding and reasoning over creative output
Multimodal	Moderate	Current release is text-only — no native vision or video input
Response Speed	Moderate	Hybrid attention keeps long-context inference efficient, but Think Max mode is verbose
Context Window	Huge	1M tokens — among the largest open-source context windows, with sparse attention keeping it affordable

Use Cases

Codebase Analysis & Refactoring: Load an entire repository into 1M-token context and run cross-file refactors with frontier-grade coding ability.
AI Coding Assistants & IDE Integration: Drop-in replacement backend for Cursor, Claude Code, Cline, and Continue at a fraction of closed-model cost.
Competitive Programming & Algorithms: A 3,206 Codeforces rating makes V4 Pro a serious tutor for ACM training, contest prep, and interview practice.
Long-Document Legal & Financial Analysis: Million-token context handles entire books, prospectuses, or multi-contract bundles in a single inference.
Scientific & Mathematical Research: Strong IMO, HMMT, and GPQA performance makes it useful for math, physics, and biology research assistance.
Self-Hosted Enterprise Deployment: MIT license plus open weights enable on-prem rollouts on 8×H100 or 4×H200 clusters for compliance-sensitive workloads.
High-Volume API & Agentic Production: Aggressive token pricing with automatic cache discounts is ideal for long-running multi-step agent workloads.

Token Type	LinkAI Price	Official Price
input	$1.392000 / 1M tokens	$1.740000 / 1M tokens
output	$2.784000 / 1M tokens	$3.480000 / 1M tokens
cache_read	$0.011600 / 1M tokens	$0.014500 / 1M tokens
cache_write	$0.000000 / 1M tokens	$0.000000 / 1M tokens

DeepSeek/deepseek-v4-pro

More from DeepSeek

README