MoonshotAI

Moonshot/kimi-k2.6

From $0.136 / 1M tokens

Moonshot AI's open-weight 1T-parameter MoE multimodal agentic model, excelling in long-horizon coding, 300-sub-agent swarm orchestration, and native tool use — built for autonomous software engineering and agent workflows.

Chat

More from Moonshot

README

Moonshot/kimi-k2.6

Supported Functionality

ItemSpecification
InputText, Image, Video
OutputText
Context262,144 tokens (~256K)
Max Output262,142 tokens
Vision✓ Supported (native multimodal via MoonViT vision encoder)
Function Calling✓ Supported (OpenAI / Anthropic-compatible API)

Description

Kimi K2.6 is Moonshot AI's open-weight flagship model, released on April 20, 2026 under a Modified MIT License. Built on a Mixture-of-Experts (MoE) architecture, it has 1 trillion total parameters with 32 billion activated per token (8 of 384 experts plus 1 shared expert), uses 61 layers with Multi-head Latent Attention (MLA) and SwiGLU activation, and ships natively with INT4 quantization.

K2.6's defining advance is long-horizon autonomous execution — the model can run for 12+ hours and 4,000+ tool calls on a single engineering goal without human intervention. It also introduces an Agent Swarm capable of orchestrating up to 300 domain-specialized sub-agents across 4,000 coordinated steps. On SWE-Bench Pro it scores 58.6, matching GPT-5.5 and surpassing Claude Opus 4.6 (53.4) and Gemini 3.1 Pro (54.2); on Humanity's Last Exam (with tools) it leads every frontier model at 54.0.

Key Capabilities

  • Long-Horizon Coding: Sustains 12+ hour autonomous engineering sessions; scores 80.2 on SWE-Bench Verified and 89.6 on LiveCodeBench v6.
  • Agent Swarm Orchestration: Dispatches up to 300 sub-agents across 4,000 coordinated steps in one run; scores 86.3 on BrowseComp in swarm mode.
  • Native Multimodality: Unified text/image/video processing via the MoonViT encoder — no separate vision pipeline required.
  • Coding-Driven Design: Generates full front-end interfaces from sketches or natural language, wiring authentication and databases automatically (50%+ gain over K2.5 on Vercel's Next.js benchmark).
  • Frontier Agentic Tool Use: Leads on HLE (with tools) at 54.0 and reaches 66.7 on Terminal-Bench 2.0.
  • Claw Groups Collaboration: Humans and heterogeneous agents on any device or model can collaborate in one shared workspace, with K2.6 as adaptive coordinator.
  • Long Context with Caching: 262K-token window plus prefix caching (as low as $0.15/M cached tokens at some providers) — ideal for long sessions and tool-heavy workflows.

Technical Strengths

FeatureBenefit
1T MoE / 32B ActiveTrillion-scale knowledge breadth with inference cost close to a 32B dense model, dramatically lowering per-token cost
Native INT4 QuantizationLower memory footprint and deployment cost; on-prem and self-hosted inference without post-hoc quality loss
Thinking / Instant Dual ModeDeep chain-of-thought for hard problems, fast direct answers for simple queries — developer-controllable per call
MuonClip OptimizerZero training instability at 1T scale across 15.5T pretraining tokens
Modified MIT LicenseWeights freely downloadable, self-hostable, and commercially usable — no vendor lock-in
OpenAI/Anthropic-Compatible APIExisting client code migrates in minutes; works with Kimi Code CLI, vLLM, SGLang, and KTransformers

Capability Ratings

DimensionRatingNotes
ReasoningExcellentLeads HLE-with-tools; AIME 2026 (96.4) just below GPT-5.5 but still top-tier
CodingTop-tierMatches GPT-5.5 on SWE-Bench Pro (58.6); beats Claude Opus 4.6 on LiveCodeBench v6
Creative WritingStrongFluent prose, though tuning prioritizes engineering and agentic workloads over creative output
MultimodalExcellentUnified native architecture for text/image/video via MoonViT, including visual coding
Response SpeedModerate~59 tokens/s on first-party API; Thinking mode is verbose, Instant mode is faster
Context WindowHuge262K tokens — fits full codebases, long sessions, and large tool schemas simultaneously

Use Cases

  • Autonomous Software Engineering: Hand off long-horizon dev tasks — e.g. a 13-hour refactor of a financial matching engine yielded a 185% median throughput gain.
  • AI Coding Assistants & IDE Integration: Works with Kimi Code CLI, Cursor, VS Code, and other IDEs for long-context, low-cost code generation and refactoring.
  • Multi-Agent R&D Workflows: Use Agent Swarm to parallelize hundreds of sub-agents producing research docs, websites, or analytical reports.
  • Front-End Auto-Generation: Turn design specs or natural language into production-ready Next.js / React apps with backend wiring included.
  • Long-Context Retrieval & QA: 256K window enables end-to-end reasoning over entire books, repositories, or large PDFs.
  • Self-Hosted Enterprise Deployment: Open weights + INT4 make on-prem and private-cloud rollouts compliant and cost-efficient.
  • Proactive Background Agents: 24/7 schedule management, code execution, and cross-platform orchestration for ops automation and monitoring.

Pricing

Token TypeLinkAI PriceOfficial Price
input$0.807500 / 1M tokens$0.950000 / 1M tokens
output$3.400000 / 1M tokens$4.000000 / 1M tokens
cache_read$0.136000 / 1M tokens$0.160000 / 1M tokens
cache_write$0.000000 / 1M tokens$0.000000 / 1M tokens