Google/gemini-2.5-flash-image
From $0.029250/ callNano Banana (gemini-2.5-flash-image) is Google's natively multimodal image generation and editing model designed for speed and efficiency, supporting text-to-image, conversational editing, multi-image composition, and character consistency.
More from Google
README
Google/gemini-2.5-flash-image
Nano Banana (Gemini 2.5 Flash Image) is Google's multimodal image generation and editing model built on the Gemini 2.5 Flash architecture. Unlike conventional image generators that bolt generation onto a text model, it was trained from the ground up to process text and images in a single unified step, enabling conversational creation and editing workflows.
Its core strengths are speed and cost efficiency — standard requests typically generate images in 1–2 seconds, with most tasks completing in under 10 seconds. It also leverages Gemini's world knowledge to incorporate real-world semantic understanding into image generation, going beyond simple aesthetic output.
Key Capabilities
- Text-to-Image: Generate high-quality artistic or photorealistic images from simple or complex text descriptions.
- Image Editing (Inpainting / Refining): Make precise, targeted edits using natural language — blur backgrounds, remove objects, alter poses, colorize black-and-white photos, and more.
- Multi-Image Composition: Combine elements from multiple images into a single cohesive visual, supporting up to 14 reference images as input.
- Character & Style Consistency: Maintain the appearance of characters, objects, or visual styles across multiple prompts and images — ideal for branding, storytelling, and serial content.
- Text Rendering: Generate clear, legible, and stylistically consistent text within images for logos, posters, infographics, and diagrams.
- Iterative Refinement: Engage in multi-turn conversations to progressively tweak and refine images step by step until the result matches your vision.
Technical Strengths
| Feature | Benefit |
|---|---|
| Contextual awareness | Natively multimodal architecture deeply understands prompt semantics and intent for more accurate results |
| Style versatility | Switches effortlessly between 3D renders, oil paintings, sketches, photorealistic photography, and more |
| Speed | 1–2 second image generation for standard requests, optimized for high-throughput, low-latency production environments |
| Conversational editing | Supports multi-turn context memory, enabling progressive image modifications like working with a designer |
| World knowledge | Leverages the Gemini foundation model's knowledge base to understand real-world semantics (landmarks, brands, scientific concepts) |
| SynthID watermarking | All generated or edited images are automatically embedded with an invisible SynthID digital watermark for AI content provenance |
Pricing
| Quality | LinkAI Price | Official Price |
|---|---|---|
| 1K | 0.029250 | 0.039000 |