Kling

Kling/kling-v3

From $0.060970/ call

Kuaishou's unified multimodal video model with native 4K/60fps output, AI Director multi-shot storyboarding, multilingual native audio, and ultimate character consistency, unifying video understanding, generation, and editing in one workflow.

Text to VideoImage to Video

More from Kling

README

Kling/kling-v3

Kling V3 (Kling 3.0) is the third-generation video model series launched by Kuaishou Technology on February 4, 2026, comprising Video 3.0, Video 3.0 Omni, Image 3.0, and Image 3.0 Omni. Built on the Multi-modal Visual Language (MVL) architecture, it unifies understanding, generation, and editing of text, images, audio, and video into a single AI workflow — marking AI's evolution from a mere generation tool into an intelligent creative partner capable of grasping artistic intent and turning ideas into reality.

Compared to the preceding Kling 2.6 and O1 series, V3 achieves several milestone breakthroughs: native 4K/60fps output — the first AI video model to generate truly broadcast-quality footage directly from text prompts; AI Director system — automatically planning up to 6 shots within a single 15-second generation with intelligent dispatch of shot sizes, perspectives, and camera movements; ultimate character consistency — Elements 3.0's "Director Memory" system precisely maintains character visual identity across multiple angles, shots, and complex motion; native multilingual audio — synchronized dialogue and sound effects in Chinese, English, Japanese, Korean, Spanish, and various dialect accents. Since launching in June 2024, Kling AI now serves over 60 million creators worldwide and has produced more than 600 million videos.

Key Capabilities

  • Native 4K / 60fps: Directly generates 3840×2160 resolution at 60 frames per second — not upscaled or interpolated — maintaining clarity on large displays at broadcast-quality standards.
  • AI Director Multi-Shot Storyboard: Automatically plans up to 6 shots within a single 15-second generation, with customizable duration, shot size, perspective, narrative content, and camera movement per shot while maintaining spatial continuity.
  • Ultimate Character Consistency: Elements 3.0's "Director Memory" system precisely tracks and maintains character visual identity across multiple camera angles, shot transitions, and scene changes, even when combined with voice synchronization and complex camera movements.
  • Reference-Based Generation (3.0 Omni): Upload a reference video and AI extracts the character's visual and voice traits, faithfully replicating the same character across new scenes.
  • Native Multilingual Audio: Synchronously produces dialogue, sound effects, and ambient audio during video generation, supporting Chinese, English, Japanese, Korean, Spanish, and various dialect accents.
  • 4K Ultra-HD Images (Image 3.0): Companion image models support 2K and 4K ultra-high-definition output for virtual scene visualization to full-scale production assets.
  • Visual Chain of Thought (vCoT): The model performs structured reasoning before rendering — decomposing scene elements, planning motion paths, considering lighting and composition, then executing generation.

Technical Strengths

FeatureBenefit
Native 4K / 60fps outputThe first AI model to directly generate true 4K/60fps video — clarity holds on 65-inch displays, meeting broadcast and film production standards
AI Director systemAutomatically dispatches shot sizes, perspectives, and camera movements across up to 6 shots in a single generation — evolving from "generating clips" to "directing films"
Director Memory consistencyElements 3.0's Director Memory precisely maintains character identity across multi-shot, multi-angle sequences, making the storyboarding feature truly usable for narrative content
Unified workflowIntegrates video generation, editing, and audio synthesis into a single pipeline, eliminating the fragmented multi-tool switching of traditional workflows
Multilingual audio syncNatively supports synchronized dialogue generation in 5 languages with multiple accents — different characters in a single scene can speak different languages
MVL unified architectureMulti-modal Visual Language framework processes text, images, audio, and video in a unified semantic space for precise execution of compound creative instructions

Pricing

ResolutionaudioLinkAI PriceOfficial Price
1080Pfalse0.0812700.116100
1080Ptrue0.1219400.174200
720Pfalse0.0609700.087100
720Ptrue0.0914200.130600