What is the difference between Gemini and ChatGPT?

Gemini was developed by Google and offers native multimodal capabilities (text, images, audio, video) since generation 1.5. The biggest difference: Gemini 2.5 Pro supports up to 1 million token context window (vs. 128,000 for GPT-4o), enabling the processing of very long documents. Gemini is deeply integrated with Google services (Search, Workspace, Android) and with 2.5 Flash-Lite offers the currently cheapest powerful LLM on the market ($0.10 / $0.40 per million tokens). ChatGPT, on the other hand, has stronger reasoning capabilities and a more mature user community.

Which Gemini model should I use for programming?

Gemini 2.5 Pro is the best Gemini model for complex programming tasks with strong performance in code generation, debugging, and refactoring. It offers 1 million token context for large codebases. For quick code snippets and standard tasks, Gemini 2.5 Flash is the better choice (significantly cheaper at $0.30 / $2.50 vs. $1.25–2.50 / $10–15). For simple code formatting or syntax checks, Gemini 2.5 Flash-Lite is sufficient (only $0.10 / $0.40).

How much does the Gemini API cost?

The Gemini API has tiered pricing: Gemini 2.5 Flash-Lite is the cheapest ($0.10 / $0.40 per million tokens), Gemini 2.5 Flash costs $0.30 / $2.50 (best balance), and Gemini 2.5 Pro costs $1.25–2.50 / $10–15 with tiered pricing based on context length. Gemini 3.0 Pro Preview uses a two-tier pricing model: under 200,000 tokens costs $2.00 / $12.00 per million tokens, over 200,000 tokens the price increases to $4.00 / $18.00. Gemini 2.0 Flash is currently available for free with rate limits. All models also offer prompt caching with 75 % discount on cached input tokens.

What is Gemini Nano and what is it used for?

Gemini Nano is Google's on-device AI model that runs directly on smartphones and edge devices without a cloud connection. There are two variants: Nano-1 (4,000 token context, text-only) and Nano-2 (4,000 tokens, multimodal). Gemini Nano is used for privacy-sensitive tasks like offline translation, smart reply in messaging apps, live transcription, and photo editing directly on the device. It's already integrated in Pixel smartphones, Samsung Galaxy S24+, and other Android devices.

Is Gemini free to use?

Yes, partially. Gemini 2.0 Flash is currently free to use via the API with rate limits (15 requests per minute, 1,500 per day, 1 million per month). Developers can test Gemini models through Google AI Studio and the free API tier. For production applications, however, Google recommends the paid models without rate limits. The web version via google.com/gemini also offers free access with limitations.

What is the difference between Gemini Pro and Flash?

Gemini Pro is the premium variant with highest performance, optimized for complex reasoning, code, and analysis tasks. It costs more ($1.25–2.50 / $10–15 vs. $0.30 / $2.50) and is slower, but delivers more precise results for difficult problems. Gemini Flash is the balanced variant with faster performance and significantly lower costs, ideal for 90 % of all standard use cases. Pro is suitable for science, complex code reviews, and deep analysis, Flash for chatbots, content creation, and standard automation.

How large is the context window of Gemini models?

The current Gemini models offer different context windows: Gemini 2.5 Pro supports 1 million tokens (experimentally up to 2 million possible), Gemini 2.5 Flash and Flash-Lite also 1 million tokens, and Gemini 2.0 Flash 1 million tokens. This corresponds to approximately 700,000 words or 1,400+ pages of text. Gemini Nano (on-device) has only 4,000 tokens. The older 1.5 Pro model had 2 million tokens (a record at the time), while 1.0 only supported 32,000 tokens.

Which Gemini versions are currently available?

Currently available are: Gemini 2.5 Pro (Premium, November 2024), Gemini 2.5 Flash (balanced, November 2024), Gemini 2.5 Flash-Lite (cheap/fast, November 2024), Gemini 2.0 Flash (free, September 2024), and Gemini 3.0 Pro Preview (latest preview, November 2025). The older models Gemini 1.5 Pro and 1.5 Flash will be discontinued on April 30, 2025. Gemini Nano-1 and Nano-2 remain available for on-device use.

Can Gemini understand images and videos?

Yes, all Gemini models from version 1.5 onwards are natively multimodal and can process images, videos, audio, and text simultaneously. Gemini can analyze images, recognize objects, understand screenshots, interpret diagrams, summarize videos, and even transcribe audio. This native multimodality (trained from the start with all modalities) distinguishes Gemini from competitors like GPT-4, which added text and vision separately. Gemini is particularly strong at analyzing long videos (up to 60 minutes).

What are the multimodal capabilities of Gemini?

Gemini supports four modalities simultaneously: text, images (PNG, JPEG, WEBP), audio (MP3, WAV, FLAC), and video (MP4, MOV). The model can: analyze and explain screenshots, extract code from images, convert diagrams to text, summarize long videos (up to 60 min.), transcribe and translate audio, compare multiple documents simultaneously, and process complex multimodal prompts (e.g., image + text + audio). This native multimodality makes Gemini ideal for content analysis, accessibility tools, and EdTech applications.

When will Gemini 3.0 be released?

Gemini 3.0 Pro Preview was announced in November 2025 and is already available in early access. The full public release of Gemini 3.0 is expected for early 2026. Google has announced that Gemini 3.0 will bring further improvements in reasoning, multimodality, and efficiency, but has not yet communicated detailed benchmarks or features. Developers can register for early access through Google AI Studio.

What is the difference between Gemini 1.5 and 2.5?

Gemini 2.5 (November 2024) brings significant improvements over 1.5: 40 % better performance on reasoning tasks, 30 % faster inference speed, improved code generation with specialized training, more efficient context usage despite reduced maximum (1M instead of 2M tokens), and significantly better multimodality especially for videos and audio. The pricing structure has been simplified with prompt caching support. Gemini 1.5 will be fully discontinued on April 30, 2025 and replaced by 2.5.

Which is the cheapest Gemini model?

Gemini 2.5 Flash-Lite is the cheapest available model at $0.10 for input and $0.40 for output per million tokens – making it one of the most affordable powerful LLMs on the market. For completely free access, Gemini 2.0 Flash offers free API access with rate limits (15 req./min, 1,500/day). For comparison: ChatGPT GPT-4o-mini costs $0.15 / $0.60, Claude 3 Haiku $0.25 / $1.25 – Gemini Flash-Lite is thus 50–60 % cheaper than the next competitor.

Does Gemini support prompt caching?

Yes, all current Gemini models (2.5 Pro, Flash, Flash-Lite) support context caching (prompt caching). Cached input tokens cost only 25 % of the standard price (75 % discount). This is particularly valuable for long system prompts, large documents, or repeated requests with the same context. Example: Gemini 2.5 Flash input normally costs $0.30, cached only $0.075 per million tokens. The cache remains active for up to 60 minutes and is automatically managed.

How does Gemini perform in benchmarks?

Gemini 2.5 Pro achieves in current benchmarks (as of November 2024): MMLU (multitask understanding) 85.9 %, HumanEval (code) 84.1 %, MMMU (multimodal) 62.4 %, and GSM8K (mathematics) 91.7 %. This puts it slightly behind GPT-4o (MMLU 88.7 %) and Claude 4.5 Sonnet (especially in code), but surpasses most other models. Gemini 2.5 Flash offers 90 % of Pro performance at a fraction of the cost and latency. In video understanding and long context, Gemini leads the benchmarks.

Gemini Models: All Google Models at a Glance

Google Gemini has established itself as one of the strongest competitors to ChatGPT and Claude since its launch in December 2023.

The evolution is impressive: Starting with the first 1.0 models, then the major 1.5 versions with 2 million token context, up to the current 2.5 models. Each generation has surpassed the previous one.

But which Gemini model is right for your application? What distinguishes Pro from Flash? And how does Gemini compare to ChatGPT and Claude?

In this article, I'll explain everything important about the different Gemini models, their features, prices, and availability.

TL;DRKey Takeaways

Gemini 2.5 Pro is the latest premium model (November 2024) with 1 million token context and strong performance in code and analysis for $1.25-2.50/$10-15 per million tokens
Gemini 2.5 Flash-Lite is the cheapest powerful LLM on the market ($0.10/$0.40 per million tokens) and offers the best balance of speed, cost, and quality
All modern Gemini models (from 1.5) are natively multimodal and process text, images, audio, and video simultaneously with up to 1 million token context

What are Gemini Models?

Gemini models are Google's advanced Large Language Models developed by DeepMind and Google Research.

What makes Gemini different? A few things stand out immediately:

First: Native multimodality from the start. Google trained Gemini with text, images, audio, and video – not like other providers who patched that in later. This gives Gemini a much deeper understanding of all these modalities together.

Then there's the context window: Gemini 2.5 Pro processes up to 1 million tokens (experimentally even 2 million). That's approximately 700,000 words or over 1,400 book pages. In a single request. That's... very large.

Google also doesn't have a one-model strategy. Instead: Nano for smartphones, Flash for most standard tasks, Pro for demanding stuff. Each has its place. And because Gemini is deeply integrated into Google Search, Workspace, and Android, it works particularly well there.

Google has taken a different approach with Gemini than OpenAI: Instead of focusing on maximum benchmark performance, the focus is on practical versatility, multimodality, and integration into the Google ecosystem.

Tip

If you want to get the most out of Gemini, I recommend our guides on prompting techniques and our comparison of the best AI tools.

Comparison of All Gemini Models

Here's a detailed overview of all Gemini models with their key properties:

Model	Release	Context Window	Multimodal	Status
Gemini 1.0 Pro	12/2023	32,000 Tokens	No	Discontinued
Gemini 1.0 Ultra	12/2023	32,000 Tokens	No	Discontinued
Gemini 1.5 Pro	02/2024	2M Tokens	Yes	Discontinued · 04/2025
Gemini 1.5 Flash	05/2024	1M Tokens	Yes	Discontinued · 04/2025
Gemini 2.0 Flash	09/2024	1M Tokens	Yes	Active
Gemini 2.5 Flash-Lite	11/2024	1M Tokens	Yes	Active
Gemini 2.5 Flash	11/2024	1M Tokens	Yes	Active
Gemini 2.5 Pro	11/2024	1M Tokens	Yes	Active
Gemini 3.0 Pro Preview	11/2025	1M Tokens	Yes	Preview
Gemini Nano-1	12/2023	4,000 Tokens	No	Active
Gemini Nano-2	05/2024	4,000 Tokens	Yes	Active

Gemini 3.0 Pro Preview

Released: November 2025

Gemini 3.0 Pro Preview is the latest generation of Google's AI models and is currently in early access.

Key Features:

Latest AI generation from Google DeepMind
Preview access for selected developers and companies
Improved reasoning capabilities compared to Gemini 2.5
Multimodal improvements especially in video understanding
1 million token context window (input), up to 64,000 tokens output
Tiered API pricing: $2.00 / $12.00 per million tokens (under 200k context), $4.00 / $18.00 (over 200k context)
Access via Google AI Studio Early Access Program

Availability: Gemini 3.0 Pro is currently only available as a preview for selected partners. The full public release is expected for early 2026. During the preview phase, Google may offer free API usage for testing purposes.

Gemini 2.5 Pro

Released: November 2024

Gemini 2.5 Pro is Google's current premium variant. It has the highest performance of the entire family – if you need to solve complex tasks, this is your model. (More about the Gemini API in our separate guide.)

What does Pro offer specifically?

State-of-the-art performance on complex reasoning and code tasks
1 million token context window (experimentally also 2 million)
Tiered pricing: $1.25 / $10 for standard prompts (≤ 128K tokens), $2.50 / $15 for longer ones
Native multimodality – process text, images, audio, video together
Prompt caching with 75 % discount on cached inputs ($0.3125 instead of $1.25–2.50)
API model string: gemini-2.5-pro

What makes Gemini 2.5 Pro special?

Gemini 2.5 Pro is Google's answer to Claude 4 Opus and GPT-4o. It offers comparable performance on complex reasoning tasks and surpasses both competitors in processing very long contexts. The 1 million token window enables analysis of complete books, large codebases, or hours of video transcripts in a single API call.

The tiered pricing structure makes it economical: For most standard prompts (≤ 128K tokens) you pay only $1.25 / $10 – significantly cheaper than Claude 4 Opus ($15 / $75) with comparable performance.

Where can you get it? The Google AI API, Google AI Studio, Vertex AI, or Google Cloud.

When do you need Pro? When you want to analyze entire codebases, comb through long research papers, summarize thick contracts, or process hours of video in one shot. This isn't meant for chatbots – that's what Flash is for and it's cheaper.

Gemini 2.5 Flash

Released: November 2024

Gemini 2.5 Flash is the balanced variant – the model evergreen of the 2.5 series. It delivers 90 % of Pro performance but costs a fraction and is significantly faster.

The key specs:

90 % of Pro performance at a fraction of the cost
2–3x faster than Pro (inference speed)
1 million token context
$0.30 input / $2.50 output per million tokens
Prompt caching: $0.075 for cached inputs
Multimodal: text, images, audio, video
API string: gemini-2.5-flash

What makes Gemini 2.5 Flash special?

Gemini 2.5 Flash is the ideal production model for 90 % of all use cases. It offers nearly the same quality as Pro (90 % performance) at 80 % lower cost and 2-3x faster response time. This makes it perfect for chatbots, content generation, and automation workflows where fast responses matter more than absolute highest precision.

Compared to ChatGPT GPT-4o ($15 / $60 per million tokens), Gemini 2.5 Flash offers 98 % cost savings at similar quality – an unbeatable price-performance ratio.

You can find Flash via Google AI API, Google AI Studio, Vertex AI, Google Cloud – and it's the backend model for many Google products.

Specific use cases: Chatbots that need to respond quickly. Content generation (articles, marketing copy, social posts). Data extraction from unstructured sources. Email classification, sentiment analysis, summaries. Screenshot understanding and OCR. For all this, you don't need Pro, Flash is sufficient and saves money.

Tip

Before deciding on a Gemini model, use our API cost calculator to calculate the actual costs for your application.

Gemini 2.5 Flash-Lite

Released: November 2024

Gemini 2.5 Flash-Lite is what it says: The cheapest usable LLM on the market. And extremely fast at the same time.

The key numbers:

$0.10 input / $0.40 output per million tokens (cheapest on the market)
5x faster than Pro models
Still 70–80 % of Flash performance
1 million token context
Prompt caching: $0.025 for cached inputs
Multimodal: text, images, audio, video
API string: gemini-2.5-flash-lite

Why is this so interesting? It's 50–60 % cheaper than GPT-4o-mini ($0.15 / $0.60) or Claude 3 Haiku ($0.25 / $1.25). And it's not slow – rather the opposite.

The quality? 70–80 % of Flash performance for chatbot responses, simple text generation, and classification. If you need millions of API calls daily, the cost savings are enormous.

Where can you find it? Google AI API, Google AI Studio, Vertex AI.

Use cases: Chatbots with millions daily. Large-scale content moderation. Sentiment analysis, categorization, tags. Real-time applications where low latency matters. Massive batch processing on a small budget.

Gemini 2.0 Flash

Released: September 2024

Gemini 2.0 Flash is the older version of Flash. The advantage: Free with rate limits.

Quick info:

100 % free (rate limits: 15 req./min, 1,500/day, 1M/month)
~80 % of 2.5 Flash performance
1 million token context
Multimodal: text, images, audio
API string: gemini-2.0-flash

Use case: Prototyping, quick tests, low-volume applications. If you really need production without rate limits, upgrade to 2.5 Flash.

Gemini 1.5 Pro

Released: February 2024

Gemini 1.5 Pro was a big deal in 2024: First model with 2 million token context. That was a world record at the time.

Today: It will be shut down on April 30, 2025. If you're still using 1.5 Pro, migrate to 2.5 Pro – better performance, less hassle.

What 1.5 had: 2 million tokens (impressive back then). Native multimodality. Strong video and document analysis. But that was 2024.

Gemini 1.5 Flash

Released: May 2024

Gemini 1.5 Flash was basically the cheaper, faster version of 1.5 Pro. Also deprecated.

The facts: 1 million token context. Fast, low cost. Multimodal. But also going offline on April 30, 2025. Users should switch to 2.5 Flash.

Gemini 1.0 Pro and Ultra

Released: December 2023

Gemini 1.0 was the first attempt. Today: No longer relevant.

What was it? 32,000 token context. Text-only, no images/videos. Pro was standard, Ultra was premium. Both are long gone. Google quickly replaced them with 1.5 and 2.x – much better models.

Gemini Nano

Released: December 2023 / May 2024

Gemini Nano is different: On-device AI for smartphones. Runs locally, no cloud.

What's important:

On-device: Directly on smartphones, no cloud call
Two variants: Nano-1 (text-only) and Nano-2 (multimodal)
4,000 token context (small, but sufficient for smartphone tasks)
Privacy: Everything stays local
Hardware: Pixel smartphones, Samsung Galaxy S24+, other Android devices
Use cases: Smart reply, live transcription, offline translation, photo editing

Availability: Already integrated in various Android phones. Google rolls it out via system updates. Developers can use the AICore API.

Price Comparison of All Gemini Models

The following table shows a detailed overview of all Gemini prices (all figures in $ per million tokens). For a detailed analysis, we recommend our API cost calculator:

Model	Status	Input (Standard)	Output (Standard)	Input (Cached)	Output (Cached)
Gemini 2.5 Pro	Active	$1.25 / $2.50 ≤ 128K / > 128K	$10 / $15 ≤ 128K / > 128K	$0.3125	$10 / $15
Gemini 2.5 Flash	Active	$0.30	$2.50	$0.075	$2.50
Gemini 2.5 Flash-Lite	Active	$0.10	$0.40	$0.025	$0.40
Gemini 2.0 Flash	Active	Free (Rate Limits)	Free (Rate Limits)	—	—

Important notes on the price table:

Gemini 2.5 Pro has tiered pricing: Lower prices for prompts ≤ 128,000 tokens ($1.25 / $10), higher prices for longer prompts (greater than 128,000 tokens: $2.50 / $15)
Context caching (prompt caching) enables 75 % discount on cached input tokens with repeated use. Example: Gemini 2.5 Flash input normally costs $0.30, cached only $0.075
Gemini 2.0 Flash is completely free with rate limits: 15 requests per minute, 1,500 per day, 1 million per month
Output prices for cached prompts remain the same as standard (no discount on output)

Frequently Asked Questions About Gemini Models

Google Gemini has established itself as one of the strongest competitors to ChatGPT and Claude since its launch in December 2023.

But which Gemini model is right for your application? What distinguishes Pro from Flash? And how does Gemini compare to ChatGPT and Claude?

In this article, I'll explain everything important about the different Gemini models, their features, prices, and availability.

TL;DRKey Takeaways

Gemini 2.5 Pro is the latest premium model (November 2024) with 1 million token context and strong performance in code and analysis for $1.25-2.50/$10-15 per million tokens
Gemini 2.5 Flash-Lite is the cheapest powerful LLM on the market ($0.10/$0.40 per million tokens) and offers the best balance of speed, cost, and quality
All modern Gemini models (from 1.5) are natively multimodal and process text, images, audio, and video simultaneously with up to 1 million token context

What are Gemini Models?

Gemini models are Google's advanced Large Language Models developed by DeepMind and Google Research.

What makes Gemini different? A few things stand out immediately:

Tip

If you want to get the most out of Gemini, I recommend our guides on prompting techniques and our comparison of the best AI tools.

Comparison of All Gemini Models

Here's a detailed overview of all Gemini models with their key properties:

Model	Release	Context Window	Multimodal	Status
Gemini 1.0 Pro	12/2023	32,000 Tokens	No	Discontinued
Gemini 1.0 Ultra	12/2023	32,000 Tokens	No	Discontinued
Gemini 1.5 Pro	02/2024	2M Tokens	Yes	Discontinued · 04/2025
Gemini 1.5 Flash	05/2024	1M Tokens	Yes	Discontinued · 04/2025
Gemini 2.0 Flash	09/2024	1M Tokens	Yes	Active
Gemini 2.5 Flash-Lite	11/2024	1M Tokens	Yes	Active
Gemini 2.5 Flash	11/2024	1M Tokens	Yes	Active
Gemini 2.5 Pro	11/2024	1M Tokens	Yes	Active
Gemini 3.0 Pro Preview	11/2025	1M Tokens	Yes	Preview
Gemini Nano-1	12/2023	4,000 Tokens	No	Active
Gemini Nano-2	05/2024	4,000 Tokens	Yes	Active

Gemini 3.0 Pro Preview

Released: November 2025

Gemini 3.0 Pro Preview is the latest generation of Google's AI models and is currently in early access.

Key Features:

Latest AI generation from Google DeepMind
Preview access for selected developers and companies
Improved reasoning capabilities compared to Gemini 2.5
Multimodal improvements especially in video understanding
1 million token context window (input), up to 64,000 tokens output
Tiered API pricing: $2.00 / $12.00 per million tokens (under 200k context), $4.00 / $18.00 (over 200k context)
Access via Google AI Studio Early Access Program

Gemini 2.5 Pro

Released: November 2024

What does Pro offer specifically?

State-of-the-art performance on complex reasoning and code tasks
1 million token context window (experimentally also 2 million)
Tiered pricing: $1.25 / $10 for standard prompts (≤ 128K tokens), $2.50 / $15 for longer ones
Native multimodality – process text, images, audio, video together
Prompt caching with 75 % discount on cached inputs ($0.3125 instead of $1.25–2.50)
API model string: gemini-2.5-pro

What makes Gemini 2.5 Pro special?

Where can you get it? The Google AI API, Google AI Studio, Vertex AI, or Google Cloud.

Gemini 2.5 Flash

Released: November 2024

Gemini 2.5 Flash is the balanced variant – the model evergreen of the 2.5 series. It delivers 90 % of Pro performance but costs a fraction and is significantly faster.

The key specs:

90 % of Pro performance at a fraction of the cost
2–3x faster than Pro (inference speed)
1 million token context
$0.30 input / $2.50 output per million tokens
Prompt caching: $0.075 for cached inputs
Multimodal: text, images, audio, video
API string: gemini-2.5-flash

What makes Gemini 2.5 Flash special?

Compared to ChatGPT GPT-4o ($15 / $60 per million tokens), Gemini 2.5 Flash offers 98 % cost savings at similar quality – an unbeatable price-performance ratio.

You can find Flash via Google AI API, Google AI Studio, Vertex AI, Google Cloud – and it's the backend model for many Google products.

Tip

Before deciding on a Gemini model, use our API cost calculator to calculate the actual costs for your application.

Gemini 2.5 Flash-Lite

Released: November 2024

Gemini 2.5 Flash-Lite is what it says: The cheapest usable LLM on the market. And extremely fast at the same time.

The key numbers:

$0.10 input / $0.40 output per million tokens (cheapest on the market)
5x faster than Pro models
Still 70–80 % of Flash performance
1 million token context
Prompt caching: $0.025 for cached inputs
Multimodal: text, images, audio, video
API string: gemini-2.5-flash-lite

Why is this so interesting? It's 50–60 % cheaper than GPT-4o-mini ($0.15 / $0.60) or Claude 3 Haiku ($0.25 / $1.25). And it's not slow – rather the opposite.

The quality? 70–80 % of Flash performance for chatbot responses, simple text generation, and classification. If you need millions of API calls daily, the cost savings are enormous.

Where can you find it? Google AI API, Google AI Studio, Vertex AI.

Gemini 2.0 Flash

Released: September 2024

Gemini 2.0 Flash is the older version of Flash. The advantage: Free with rate limits.

Quick info:

100 % free (rate limits: 15 req./min, 1,500/day, 1M/month)
~80 % of 2.5 Flash performance
1 million token context
Multimodal: text, images, audio
API string: gemini-2.0-flash

Use case: Prototyping, quick tests, low-volume applications. If you really need production without rate limits, upgrade to 2.5 Flash.

Gemini 1.5 Pro

Released: February 2024

Gemini 1.5 Pro was a big deal in 2024: First model with 2 million token context. That was a world record at the time.

Today: It will be shut down on April 30, 2025. If you're still using 1.5 Pro, migrate to 2.5 Pro – better performance, less hassle.

What 1.5 had: 2 million tokens (impressive back then). Native multimodality. Strong video and document analysis. But that was 2024.

Gemini 1.5 Flash

Released: May 2024

Gemini 1.5 Flash was basically the cheaper, faster version of 1.5 Pro. Also deprecated.

The facts: 1 million token context. Fast, low cost. Multimodal. But also going offline on April 30, 2025. Users should switch to 2.5 Flash.

Gemini 1.0 Pro and Ultra

Released: December 2023

Gemini 1.0 was the first attempt. Today: No longer relevant.

What was it? 32,000 token context. Text-only, no images/videos. Pro was standard, Ultra was premium. Both are long gone. Google quickly replaced them with 1.5 and 2.x – much better models.

Gemini Nano

Released: December 2023 / May 2024

Gemini Nano is different: On-device AI for smartphones. Runs locally, no cloud.

What's important:

On-device: Directly on smartphones, no cloud call
Two variants: Nano-1 (text-only) and Nano-2 (multimodal)
4,000 token context (small, but sufficient for smartphone tasks)
Privacy: Everything stays local
Hardware: Pixel smartphones, Samsung Galaxy S24+, other Android devices
Use cases: Smart reply, live transcription, offline translation, photo editing

Availability: Already integrated in various Android phones. Google rolls it out via system updates. Developers can use the AICore API.

Price Comparison of All Gemini Models

The following table shows a detailed overview of all Gemini prices (all figures in $ per million tokens). For a detailed analysis, we recommend our API cost calculator:

Model	Status	Input (Standard)	Output (Standard)	Input (Cached)	Output (Cached)
Gemini 2.5 Pro	Active	$1.25 / $2.50 ≤ 128K / > 128K	$10 / $15 ≤ 128K / > 128K	$0.3125	$10 / $15
Gemini 2.5 Flash	Active	$0.30	$2.50	$0.075	$2.50
Gemini 2.5 Flash-Lite	Active	$0.10	$0.40	$0.025	$0.40
Gemini 2.0 Flash	Active	Free (Rate Limits)	Free (Rate Limits)	—	—

Important notes on the price table:

Gemini 2.5 Pro has tiered pricing: Lower prices for prompts ≤ 128,000 tokens ($1.25 / $10), higher prices for longer prompts (greater than 128,000 tokens: $2.50 / $15)
Context caching (prompt caching) enables 75 % discount on cached input tokens with repeated use. Example: Gemini 2.5 Flash input normally costs $0.30, cached only $0.075
Gemini 2.0 Flash is completely free with rate limits: 15 requests per minute, 1,500 per day, 1 million per month
Output prices for cached prompts remain the same as standard (no discount on output)

What are Gemini Models?

Comparison of All Gemini Models

Gemini 3.0 Pro Preview

Gemini 2.5 Pro

Gemini 2.5 Flash

Gemini 2.5 Flash-Lite

Gemini 2.0 Flash

Gemini 1.5 Pro

Gemini 1.5 Flash

Gemini 1.0 Pro and Ultra

Gemini Nano

Price Comparison of All Gemini Models

Frequently Asked Questions About Gemini Models

What is the difference between Gemini and ChatGPT?

Which Gemini model should I use for programming?

How much does the Gemini API cost?

What is Gemini Nano and what is it used for?

Is Gemini free to use?

What is the difference between Gemini Pro and Flash?

How large is the context window of Gemini models?

Which Gemini versions are currently available?

Can Gemini understand images and videos?

What are the multimodal capabilities of Gemini?

When will Gemini 3.0 be released?

What is the difference between Gemini 1.5 and 2.5?

Which is the cheapest Gemini model?

Does Gemini support prompt caching?

How does Gemini perform in benchmarks?

Finn Hillebrandt

Similar Articles

ChatGPT Versions: All 28 GPT Models at a Glance

ChatGPT Statistics 2026: Fascinating Numbers, Data & Facts

The 9 Best AI Image Generation Models in 2026

Deep Research: Grok 3 vs. ChatGPT vs. Perplexity vs. Gemini

GPT-3.5 vs. GPT-4: What's the Difference?

Google Gemini API: How to Create (and Use) an API Key

What are Gemini Models?

Comparison of All Gemini Models

Gemini 3.0 Pro Preview

Gemini 2.5 Pro

Gemini 2.5 Flash

Gemini 2.5 Flash-Lite

Gemini 2.0 Flash

Gemini 1.5 Pro

Gemini 1.5 Flash

Gemini 1.0 Pro and Ultra

Gemini Nano

Price Comparison of All Gemini Models

Frequently Asked Questions About Gemini Models

What is the difference between Gemini and ChatGPT?

Which Gemini model should I use for programming?

How much does the Gemini API cost?

What is Gemini Nano and what is it used for?

Is Gemini free to use?

What is the difference between Gemini Pro and Flash?

How large is the context window of Gemini models?

Which Gemini versions are currently available?

Can Gemini understand images and videos?

What are the multimodal capabilities of Gemini?

When will Gemini 3.0 be released?

What is the difference between Gemini 1.5 and 2.5?

Which is the cheapest Gemini model?

Does Gemini support prompt caching?

How does Gemini perform in benchmarks?

Finn Hillebrandt

Similar Articles

ChatGPT Versions: All 28 GPT Models at a Glance

ChatGPT Statistics 2026: Fascinating Numbers, Data & Facts

The 9 Best AI Image Generation Models in 2026

Deep Research: Grok 3 vs. ChatGPT vs. Perplexity vs. Gemini

GPT-3.5 vs. GPT-4: What's the Difference?

Google Gemini API: How to Create (and Use) an API Key