kimi-k2.6 - LinkModel

Supported Functionality

Item	Specification
Input	Text, Image, Video
Output	Text
Context	262,144 tokens (~256K)
Max Output	262,142 tokens
Vision	✓ Supported (native multimodal via MoonViT vision encoder)
Function Calling	✓ Supported (OpenAI / Anthropic-compatible API)

Description

Kimi K2.6 is Moonshot AI's open-weight flagship model, released on April 20, 2026 under a Modified MIT License. Built on a Mixture-of-Experts (MoE) architecture, it has 1 trillion total parameters with 32 billion activated per token (8 of 384 experts plus 1 shared expert), uses 61 layers with Multi-head Latent Attention (MLA) and SwiGLU activation, and ships natively with INT4 quantization.

K2.6's defining advance is long-horizon autonomous execution — the model can run for 12+ hours and 4,000+ tool calls on a single engineering goal without human intervention. It also introduces an Agent Swarm capable of orchestrating up to 300 domain-specialized sub-agents across 4,000 coordinated steps. On SWE-Bench Pro it scores 58.6, matching GPT-5.5 and surpassing Claude Opus 4.6 (53.4) and Gemini 3.1 Pro (54.2); on Humanity's Last Exam (with tools) it leads every frontier model at 54.0.

Key Capabilities

Long-Horizon Coding: Sustains 12+ hour autonomous engineering sessions; scores 80.2 on SWE-Bench Verified and 89.6 on LiveCodeBench v6.
Agent Swarm Orchestration: Dispatches up to 300 sub-agents across 4,000 coordinated steps in one run; scores 86.3 on BrowseComp in swarm mode.
Native Multimodality: Unified text/image/video processing via the MoonViT encoder — no separate vision pipeline required.
Coding-Driven Design: Generates full front-end interfaces from sketches or natural language, wiring authentication and databases automatically (50%+ gain over K2.5 on Vercel's Next.js benchmark).
Frontier Agentic Tool Use: Leads on HLE (with tools) at 54.0 and reaches 66.7 on Terminal-Bench 2.0.
Claw Groups Collaboration: Humans and heterogeneous agents on any device or model can collaborate in one shared workspace, with K2.6 as adaptive coordinator.
Long Context with Caching: 262K-token window plus prefix caching (as low as $0.15/M cached tokens at some providers) — ideal for long sessions and tool-heavy workflows.

Technical Strengths

Feature	Benefit
1T MoE / 32B Active	Trillion-scale knowledge breadth with inference cost close to a 32B dense model, dramatically lowering per-token cost
Native INT4 Quantization	Lower memory footprint and deployment cost; on-prem and self-hosted inference without post-hoc quality loss
Thinking / Instant Dual Mode	Deep chain-of-thought for hard problems, fast direct answers for simple queries — developer-controllable per call
MuonClip Optimizer	Zero training instability at 1T scale across 15.5T pretraining tokens
Modified MIT License	Weights freely downloadable, self-hostable, and commercially usable — no vendor lock-in
OpenAI/Anthropic-Compatible API	Existing client code migrates in minutes; works with Kimi Code CLI, vLLM, SGLang, and KTransformers

Capability Ratings

Dimension	Rating	Notes
Reasoning	Excellent	Leads HLE-with-tools; AIME 2026 (96.4) just below GPT-5.5 but still top-tier
Coding	Top-tier	Matches GPT-5.5 on SWE-Bench Pro (58.6); beats Claude Opus 4.6 on LiveCodeBench v6
Creative Writing	Strong	Fluent prose, though tuning prioritizes engineering and agentic workloads over creative output
Multimodal	Excellent	Unified native architecture for text/image/video via MoonViT, including visual coding
Response Speed	Moderate	~59 tokens/s on first-party API; Thinking mode is verbose, Instant mode is faster
Context Window	Huge	262K tokens — fits full codebases, long sessions, and large tool schemas simultaneously

Use Cases

Autonomous Software Engineering: Hand off long-horizon dev tasks — e.g. a 13-hour refactor of a financial matching engine yielded a 185% median throughput gain.
AI Coding Assistants & IDE Integration: Works with Kimi Code CLI, Cursor, VS Code, and other IDEs for long-context, low-cost code generation and refactoring.
Multi-Agent R&D Workflows: Use Agent Swarm to parallelize hundreds of sub-agents producing research docs, websites, or analytical reports.
Front-End Auto-Generation: Turn design specs or natural language into production-ready Next.js / React apps with backend wiring included.
Long-Context Retrieval & QA: 256K window enables end-to-end reasoning over entire books, repositories, or large PDFs.
Self-Hosted Enterprise Deployment: Open weights + INT4 make on-prem and private-cloud rollouts compliant and cost-efficient.
Proactive Background Agents: 24/7 schedule management, code execution, and cross-platform orchestration for ops automation and monitoring.

Token Type	LinkAI Price	Official Price
input	$0.807500 / 1M tokens	$0.950000 / 1M tokens
output	$3.400000 / 1M tokens	$4.000000 / 1M tokens
cache_read	$0.136000 / 1M tokens	$0.160000 / 1M tokens
cache_write	$0.000000 / 1M tokens	$0.000000 / 1M tokens

Moonshot/kimi-k2.6

More from Moonshot

README