Gemini

Google/gemini-2.5-flash-image

From $0.029250/ call

Nano Banana (gemini-2.5-flash-image) is Google's natively multimodal image generation and editing model designed for speed and efficiency, supporting text-to-image, conversational editing, multi-image composition, and character consistency.

Text to ImageImage to Image

More from Google

README

Google/gemini-2.5-flash-image

Nano Banana (Gemini 2.5 Flash Image) is Google's multimodal image generation and editing model built on the Gemini 2.5 Flash architecture. Unlike conventional image generators that bolt generation onto a text model, it was trained from the ground up to process text and images in a single unified step, enabling conversational creation and editing workflows.

Its core strengths are speed and cost efficiency — standard requests typically generate images in 1–2 seconds, with most tasks completing in under 10 seconds. It also leverages Gemini's world knowledge to incorporate real-world semantic understanding into image generation, going beyond simple aesthetic output.

Key Capabilities

  • Text-to-Image: Generate high-quality artistic or photorealistic images from simple or complex text descriptions.
  • Image Editing (Inpainting / Refining): Make precise, targeted edits using natural language — blur backgrounds, remove objects, alter poses, colorize black-and-white photos, and more.
  • Multi-Image Composition: Combine elements from multiple images into a single cohesive visual, supporting up to 14 reference images as input.
  • Character & Style Consistency: Maintain the appearance of characters, objects, or visual styles across multiple prompts and images — ideal for branding, storytelling, and serial content.
  • Text Rendering: Generate clear, legible, and stylistically consistent text within images for logos, posters, infographics, and diagrams.
  • Iterative Refinement: Engage in multi-turn conversations to progressively tweak and refine images step by step until the result matches your vision.

Technical Strengths

FeatureBenefit
Contextual awarenessNatively multimodal architecture deeply understands prompt semantics and intent for more accurate results
Style versatilitySwitches effortlessly between 3D renders, oil paintings, sketches, photorealistic photography, and more
Speed1–2 second image generation for standard requests, optimized for high-throughput, low-latency production environments
Conversational editingSupports multi-turn context memory, enabling progressive image modifications like working with a designer
World knowledgeLeverages the Gemini foundation model's knowledge base to understand real-world semantics (landmarks, brands, scientific concepts)
SynthID watermarkingAll generated or edited images are automatically embedded with an invisible SynthID digital watermark for AI content provenance

Pricing

QualityLinkAI PriceOfficial Price
1K0.0292500.039000