Moonshot/kimi-k2.6
Moonshot AI's open-weight 1T-parameter MoE multimodal agentic model, excelling in long-horizon coding, 300-sub-agent swarm orchestration, and native tool use — built for autonomous software engineering and agent workflows.
More from Moonshot
README
Moonshot/kimi-k2.6
Supported Functionality
| Item | Specification |
|---|---|
| Input | Text, Image, Video |
| Output | Text |
| Context | 262,144 tokens (~256K) |
| Max Output | 262,142 tokens |
| Vision | ✓ Supported (native multimodal via MoonViT vision encoder) |
| Function Calling | ✓ Supported (OpenAI / Anthropic-compatible API) |
Description
Kimi K2.6 is Moonshot AI's open-weight flagship model, released on April 20, 2026 under a Modified MIT License. Built on a Mixture-of-Experts (MoE) architecture, it has 1 trillion total parameters with 32 billion activated per token (8 of 384 experts plus 1 shared expert), uses 61 layers with Multi-head Latent Attention (MLA) and SwiGLU activation, and ships natively with INT4 quantization.
K2.6's defining advance is long-horizon autonomous execution — the model can run for 12+ hours and 4,000+ tool calls on a single engineering goal without human intervention. It also introduces an Agent Swarm capable of orchestrating up to 300 domain-specialized sub-agents across 4,000 coordinated steps. On SWE-Bench Pro it scores 58.6, matching GPT-5.5 and surpassing Claude Opus 4.6 (53.4) and Gemini 3.1 Pro (54.2); on Humanity's Last Exam (with tools) it leads every frontier model at 54.0.
Key Capabilities
- Long-Horizon Coding: Sustains 12+ hour autonomous engineering sessions; scores 80.2 on SWE-Bench Verified and 89.6 on LiveCodeBench v6.
- Agent Swarm Orchestration: Dispatches up to 300 sub-agents across 4,000 coordinated steps in one run; scores 86.3 on BrowseComp in swarm mode.
- Native Multimodality: Unified text/image/video processing via the MoonViT encoder — no separate vision pipeline required.
- Coding-Driven Design: Generates full front-end interfaces from sketches or natural language, wiring authentication and databases automatically (50%+ gain over K2.5 on Vercel's Next.js benchmark).
- Frontier Agentic Tool Use: Leads on HLE (with tools) at 54.0 and reaches 66.7 on Terminal-Bench 2.0.
- Claw Groups Collaboration: Humans and heterogeneous agents on any device or model can collaborate in one shared workspace, with K2.6 as adaptive coordinator.
- Long Context with Caching: 262K-token window plus prefix caching (as low as $0.15/M cached tokens at some providers) — ideal for long sessions and tool-heavy workflows.
Technical Strengths
| Feature | Benefit |
|---|---|
| 1T MoE / 32B Active | Trillion-scale knowledge breadth with inference cost close to a 32B dense model, dramatically lowering per-token cost |
| Native INT4 Quantization | Lower memory footprint and deployment cost; on-prem and self-hosted inference without post-hoc quality loss |
| Thinking / Instant Dual Mode | Deep chain-of-thought for hard problems, fast direct answers for simple queries — developer-controllable per call |
| MuonClip Optimizer | Zero training instability at 1T scale across 15.5T pretraining tokens |
| Modified MIT License | Weights freely downloadable, self-hostable, and commercially usable — no vendor lock-in |
| OpenAI/Anthropic-Compatible API | Existing client code migrates in minutes; works with Kimi Code CLI, vLLM, SGLang, and KTransformers |
Capability Ratings
| Dimension | Rating | Notes |
|---|---|---|
| Reasoning | Excellent | Leads HLE-with-tools; AIME 2026 (96.4) just below GPT-5.5 but still top-tier |
| Coding | Top-tier | Matches GPT-5.5 on SWE-Bench Pro (58.6); beats Claude Opus 4.6 on LiveCodeBench v6 |
| Creative Writing | Strong | Fluent prose, though tuning prioritizes engineering and agentic workloads over creative output |
| Multimodal | Excellent | Unified native architecture for text/image/video via MoonViT, including visual coding |
| Response Speed | Moderate | ~59 tokens/s on first-party API; Thinking mode is verbose, Instant mode is faster |
| Context Window | Huge | 262K tokens — fits full codebases, long sessions, and large tool schemas simultaneously |
Use Cases
- Autonomous Software Engineering: Hand off long-horizon dev tasks — e.g. a 13-hour refactor of a financial matching engine yielded a 185% median throughput gain.
- AI Coding Assistants & IDE Integration: Works with Kimi Code CLI, Cursor, VS Code, and other IDEs for long-context, low-cost code generation and refactoring.
- Multi-Agent R&D Workflows: Use Agent Swarm to parallelize hundreds of sub-agents producing research docs, websites, or analytical reports.
- Front-End Auto-Generation: Turn design specs or natural language into production-ready Next.js / React apps with backend wiring included.
- Long-Context Retrieval & QA: 256K window enables end-to-end reasoning over entire books, repositories, or large PDFs.
- Self-Hosted Enterprise Deployment: Open weights + INT4 make on-prem and private-cloud rollouts compliant and cost-efficient.
- Proactive Background Agents: 24/7 schedule management, code execution, and cross-platform orchestration for ops automation and monitoring.
Pricing
| Token Type | LinkAI Price | Official Price |
|---|---|---|
| input | $0.807500 / 1M tokens | $0.950000 / 1M tokens |
| output | $3.400000 / 1M tokens | $4.000000 / 1M tokens |
| cache_read | $0.136000 / 1M tokens | $0.160000 / 1M tokens |
| cache_write | $0.000000 / 1M tokens | $0.000000 / 1M tokens |