13 models across 4 generations. Some of them free. Some of them genuinely excellent. And some already discontinued.
The Gemini model lineup has gotten confusing fast since Google launched the first version in December 2023. Pro, Flash, Flash-Lite, Nano, Ultra. What does each one do? Which one should you actually use?
I use Gemini models regularly (mostly via the API and Gemini CLI), and I keep this overview updated as new models drop. Here is everything you need to know about features, prices, and availability of all ChatGPT and Claude competitor models from Google.
- Gemini 3.5 Flash (May 2026) is Google's newest Flash generation with 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas, roughly 4x faster on output tokens than comparable frontier models ($1.50 / $9 per million tokens)
- Gemini 3.1 Pro (February 2026) remains the strongest pure-reasoning model with 94.3% on GPQA Diamond and 80.6% on SWE-bench Verified, a 2x reasoning leap over its predecessors
- Gemini 2.5 Flash-Lite remains the cheapest powerful LLM on the market ($0.10/$0.40 per million tokens) and offers the best balance of speed, cost, and quality
- All modern Gemini models (from 1.5) are natively multimodal and process text, images, audio, and video simultaneously with up to 1 million token context
What are Gemini Models?
Gemini models are Google's advanced Large Language Models developed by DeepMind and Google Research.
What makes Gemini different? A few things stand out immediately:
First: Native multimodality from the start. Google trained Gemini with text, images, audio, and video, not like other providers who patched that in later. This gives Gemini a much deeper understanding of all these modalities together.
Then there's the context window: Gemini 3.1 Pro processes up to 1 million tokens. That's approximately 700,000 words or over 1,400 book pages. In a single request. That's... very large.
Google also doesn't have a one-model strategy. Instead: Nano for smartphones, Flash for most standard tasks, Pro for demanding stuff. Each has its place. And because Gemini is deeply integrated into Google Search, Workspace, and Android, it works particularly well there.
Google has taken a different approach with Gemini than OpenAI: Instead of focusing on maximum benchmark performance, the focus is on practical versatility, multimodality, and integration into the Google ecosystem.
Comparison of All Gemini Models
Here's a detailed overview of all Gemini models with their key properties:
Model | Release | Context Window | Multimodal | Status |
|---|---|---|---|---|
| Gemini 1.0 Pro | 12/2023 | 32,000 Tokens | No | Discontinued |
| Gemini 1.0 Ultra | 12/2023 | 32,000 Tokens | No | Discontinued |
| Gemini 1.5 Pro | 02/2024 | 2M Tokens | Yes | Discontinued · 04/2025 |
| Gemini 1.5 Flash | 05/2024 | 1M Tokens | Yes | Discontinued · 04/2025 |
| Gemini 2.0 Flash | 12/2024 | 1M Tokens | Yes | Discontinued · 06/2026 |
| Gemini 2.5 Flash-Lite | 06/2025 | 1M Tokens | Yes | Active |
| Gemini 2.5 Flash | 04/2025 | 1M Tokens | Yes | Active |
| Gemini 2.5 Pro | 03/2025 | 1M Tokens | Yes | Active |
| Gemini 3 Flash | 12/2025 | 1M Tokens | Yes | Active |
| Gemini 3 Pro | 11/2025 | 1M Tokens | Yes | Discontinued · 03/2026 |
| Gemini 3.1 Pro | 02/2026 | 1M Tokens | Yes | Active |
| Gemini 3.5 Flash | 05/2026 | 1M Tokens | Yes | Active |
| Gemini Nano-1 | 12/2023 | 4,000 Tokens | No | Active |
| Gemini Nano-2 | 05/2024 | 4,000 Tokens | Yes | Active |
Gemini 3.5 Flash
Released: May 2026Gemini 3.5 Flash is Google's newest Flash generation, unveiled at Google I/O 2026 (May 19, 2026). It brings frontier performance on agentic tasks and beats Gemini 3.1 Pro on several coding and agent benchmarks, while costing a fraction.
Key Features:
- Agentic coding: 76.2% on Terminal-Bench 2.1 (beats 3.1 Pro's 70.3%)
- Tool use: 83.6% on MCP Atlas
- Multimodal reasoning: 84.2% on CharXiv Reasoning
- ~4x faster on output tokens than comparable frontier models
- 1 million token input, up to 64,000 tokens output
- Multimodal: text, images, audio, video, and PDF
- Price: $1.50 input / $9 output per million tokens
- Context caching: $0.15 for cached inputs (90% discount)
- API string: gemini-3.5-flash
What makes Gemini 3.5 Flash special?
Gemini 3.5 Flash is the first Flash model that competes with Pro-tier models in agentic workflows. It wins 11 of 15 published benchmarks against Gemini 3.1 Pro, especially on tool calls, long-running tasks, and browser agents. On pure reasoning (Humanity's Last Exam, ARC-AGI-2), 3.1 Pro still leads, but for most coding and agent applications, 3.5 Flash is the better and significantly cheaper choice.
The API has also changed: instead of an integer `thinking_budget`, you now set a string enum `thinking_level` with values minimal, low, medium (the new default), and high. If you migrate from 3 Flash, set it explicitly to high or the model will think less than before.
Availability: Gemini 3.5 Flash has been generally available since May 19, 2026 through the Gemini API, Google AI Studio, Vertex AI, Google Antigravity, AI Mode in Google Search, and the Gemini app. Computer Use is not yet supported on 3.5 Flash, so stay on gemini-3-flash-preview for that. Gemini 3.5 Pro has been announced and is expected to roll out in June 2026.
Gemini 3.1 Pro
Released: February 2026Gemini 3.1 Pro is Google's strongest pure-reasoning model and marks a massive leap over all predecessors. Since February 2026, it has been generally available through the Gemini API, Google AI Studio, and Vertex AI.
Key Features:
- 2x reasoning leap over Gemini 3 Pro on complex tasks
- 94.3% on GPQA Diamond (PhD-level reasoning), a new best score
- 80.6% on SWE-bench Verified (agentic coding)
- 1 million token input, up to 64,000 tokens output
- Multimodal: text, images, audio, video, and PDF
- Tiered API pricing: $2.00 / $12.00 per million tokens (under 200K context), $4.00 / $18.00 (over 200K context)
- Context caching: 90% discount on cached input tokens ($0.20 / $0.40)
- API string: gemini-3.1-pro
Availability: Gemini 3.1 Pro is generally available since February 2026 through the Gemini API in Google AI Studio, Vertex AI, and Gemini Enterprise. It is positioned as the premium model for demanding research, code, and analysis tasks.
Gemini 3 Flash
Released: December 2025Gemini 3 Flash is Google's balanced model that combines frontier intelligence with high speed and low cost. Generally available since December 2025, it is the default model in the Gemini app.
Key Features:
- Frontier performance: 90.4% on GPQA Diamond, 81.2% on MMMU Pro
- Agentic coding: 78% on SWE-bench Verified
- 3x faster than Gemini 2.5 Pro at comparable quality
- 15% better than Gemini 2.5 Flash in overall accuracy
- 1 million token input, up to 64,000 tokens output
- Multimodal: text, images, audio, video, and PDF
- Price: $0.50 input / $3.00 output per million tokens
- Context caching: 90% cost reduction on cached tokens ($0.05 input)
- API string: gemini-3-flash
Availability: Gemini 3 Flash is generally available through the Gemini API in Google AI Studio, Vertex AI, and Gemini Enterprise. It is the default model in the Gemini app and AI Mode in Google Search. Companies like JetBrains, Bridgewater Associates, and Figma use it in production.
Gemini 3 Pro
Released: November 2025Gemini 3 Pro is the third generation of Google's premium AI model with frontier intelligence, Deep Research, and premium performance. Generally available since December 2025.
Key Features:
- Frontier intelligence from Google DeepMind
- Improved reasoning capabilities over Gemini 2.5 Pro
- Deep Research for complex, multi-step analyses
- Multimodal improvements especially in video understanding
- 1 million token context window (input), up to 64,000 tokens output
- Tiered API pricing: $2.00 / $12.00 per million tokens (under 200K context), $4.00 / $18.00 (over 200K context)
- API string: gemini-3-pro
Availability: Gemini 3 Pro is generally available through the Gemini API in Google AI Studio, Vertex AI, and Gemini Enterprise. It has since been superseded by Gemini 3.1 Pro as the most powerful model, but remains a solid choice for premium tasks.
Gemini 2.5 Pro
Released: March 2025Gemini 2.5 Pro was Google's premium variant through late 2025 and remains a solid choice for demanding tasks. (More about the Gemini API in our separate guide.)
What does Pro offer specifically?
- Top-tier performance on complex reasoning and code tasks
- 1 million token context window (experimentally also 2 million)
- Tiered pricing: $1.25 / $10 for standard prompts (≤ 128K tokens), $2.50 / $15 for longer ones
- Native multimodality (process text, images, audio, video together)
- Prompt caching with 75% discount on cached inputs ($0.3125 instead of $1.25-2.50)
- API model string: gemini-2.5-pro
What makes Gemini 2.5 Pro special?
Gemini 2.5 Pro is Google's answer to the then-current Claude 4 Opus and GPT-4o. It offers comparable performance on complex reasoning tasks and surpasses both competitors in processing very long contexts.
The 1 million token window enables analysis of complete books, large codebases, or hours of video transcripts in a single API call.
The tiered pricing structure is competitive: For most standard prompts (≤ 128K tokens) you pay only $1.25 / $10, below Claude Opus 4.7 ($5 / $25) on both input and output.
Where can you get it? The Google AI API, Google AI Studio, Vertex AI, or Google Cloud.
When do you need Pro? When you want to analyze entire codebases, comb through long research papers, summarize thick contracts, or process hours of video in one shot. This isn't meant for chatbots. That's what Flash is for and it's cheaper.
Gemini 2.5 Flash
Released: April 2025Gemini 2.5 Flash is the balanced variant, the model evergreen of the 2.5 series. It delivers 90% of Pro performance but costs a fraction and is significantly faster.
The key specs:
- 90% of Pro performance at a fraction of the cost
- 2-3x faster than Pro (inference speed)
- 1 million token context
- $0.30 input / $2.50 output per million tokens
- Prompt caching: $0.075 for cached inputs
- Multimodal: text, images, audio, video
- API string: gemini-2.5-flash
What makes Gemini 2.5 Flash special?
Gemini 2.5 Flash is the ideal production model for 90% of all use cases. It offers nearly the same quality as Pro (90% performance) at 80% lower cost and 2-3x faster response time. This makes it perfect for chatbots, content generation, and automation workflows where fast responses matter more than absolute highest precision.
Compared to ChatGPT GPT-4o ($2.50 / $10 per million tokens), Gemini 2.5 Flash offers 75-88% cost savings at similar quality, a strong price-performance ratio.
You can find Flash via Google AI API, Google AI Studio, Vertex AI, Google Cloud, and it's also the backend model for many Google products.
Specific use cases: Chatbots that need to respond quickly. Content generation (articles, marketing copy, social posts). Data extraction from unstructured sources. Email classification, sentiment analysis, summaries. Screenshot understanding and OCR. For all this, you don't need Pro, Flash is sufficient and saves money.
Gemini 2.5 Flash-Lite
Released: June 2025Gemini 2.5 Flash-Lite is what it says: The cheapest usable LLM on the market. And extremely fast at the same time.
The key numbers:
- $0.10 input / $0.40 output per million tokens (cheapest on the market)
- 5x faster than Pro models
- Still 70-80% of Flash performance
- 1 million token context
- Prompt caching: $0.025 for cached inputs
- Multimodal: text, images, audio, video
- API string: gemini-2.5-flash-lite
Why is this so interesting? It's 50-60% cheaper than GPT-4o-mini ($0.15 / $0.60) or Claude 3 Haiku ($0.25 / $1.25). And it's not slow. Rather the opposite.
The quality? 70-80% of Flash performance for chatbot responses, simple text generation, and classification. If you need millions of API calls daily, the cost savings are enormous.
Where can you find it? Google AI API, Google AI Studio, Vertex AI.
Use cases: Chatbots with millions daily. Large-scale content moderation. Sentiment analysis, categorization, tags. Real-time applications where low latency matters. Massive batch processing on a small budget.
Gemini 2.0 Flash
Released: December 2024Gemini 2.0 Flash is the older version of Flash. The advantage: it has a free tier with rate limits.
Quick info:
- Free tier (rate limits: 15 req./min, 1,500/day, 1M/month) or paid tier: $0.10 / $0.40 per million tokens
- ~80% of 2.5 Flash performance
- 1 million token context
- Multimodal: text, images, audio
- API string: gemini-2.0-flash
Use case: Prototyping, quick tests, low-volume applications. If you really need production without rate limits, upgrade to 2.5 Flash.
Gemini 1.5 Pro
Released: February 2024Gemini 1.5 Pro was a big deal in 2024: First model with 2 million token context. That was a world record at the time.
Today: It was shut down on April 30, 2025. If you're still using 1.5 Pro, migrate to 2.5 Pro or newer. Better performance, less hassle.
What 1.5 had: 2 million tokens (impressive back then). Native multimodality. Strong video and document analysis. But that was 2024.
Gemini 1.5 Flash
Released: May 2024Gemini 1.5 Flash was basically the cheaper, faster version of 1.5 Pro. Also deprecated.
The facts: 1 million token context. Fast, low cost. Multimodal. But it went offline on April 30, 2025. Users should switch to 2.5 Flash or newer.
Gemini 1.0 Pro and Ultra
Released: December 2023Gemini 1.0 was the first attempt. Today: No longer relevant.
What was it? 32,000 token context. Text-only, no images/videos. Pro was standard, Ultra was premium. Both are long gone. Google quickly replaced them with 1.5 and 2.x, much better models.
Gemini Nano
Released: December 2023 / May 2024Gemini Nano is different: On-device AI for smartphones. Runs locally, no cloud.
What's important:
- On-device: Directly on smartphones, no cloud call
- Two variants: Nano-1 (text-only) and Nano-2 (multimodal)
- 4,000 token context (small, but sufficient for smartphone tasks)
- Privacy: Everything stays local
- Hardware: Pixel smartphones, Samsung Galaxy S24+, other Android devices
- Use cases: Smart reply, live transcription, offline translation, photo editing
Availability: Already integrated in various Android phones. Google rolls it out via system updates. Developers can use the AICore API.
Price Comparison of All Gemini Models
The following table shows a detailed overview of all Gemini prices (all figures in $ per million tokens). For a detailed analysis, we recommend our API cost calculator:
Model | Status | Input (Standard) | Output (Standard) | Input (Cached) | Output (Cached) |
|---|---|---|---|---|---|
| Gemini 3.5 Flash | Active | $1.5 | $9 | $0.15 | — |
| Gemini 3.1 Pro | Active | $2 ≤200K / $4 >200K | $12 ≤200K / $18 >200K | $0.2 ≤200K / $0.4 >200K | — |
| Gemini 3 Pro | Discontinued · 03/2026 | $2 ≤200K / $4 >200K | $12 ≤200K / $18 >200K | $0.2 ≤200K / $0.4 >200K | — |
| Gemini 3 Flash | Active | $0.5 | $3 | $0.05 | — |
| Gemini 2.5 Pro | Active | $1.25 ≤200K / $2.5 >200K | $10 ≤200K / $15 >200K | $0.13 ≤200K / $0.25 >200K | — |
| Gemini 2.5 Flash | Active | $0.3 | $2.5 | $0.03 | — |
| Gemini 2.5 Flash-Lite | Active | $0.1 | $0.4 | $0.01 | — |
| Gemini 2.0 Flash | Discontinued · 06/2026 | $0.1 | $0.4 | $0.03 | — |
Important notes on the price table:
- Gemini 2.5 Pro has tiered pricing: Lower prices for prompts ≤ 128,000 tokens ($1.25 / $10), higher prices for longer prompts (greater than 128,000 tokens: $2.50 / $15)
- Gemini 3.1 Pro and 3 Pro use a two-tier model: $2.00 / $12.00 per million tokens under 200K context, $4.00 / $18.00 above 200K
- Context caching (prompt caching) enables up to 90% discount on cached input tokens with repeated use. Example: Gemini 2.5 Flash input normally costs $0.30, cached only $0.03
- Gemini 2.0 Flash offers a free tier with rate limits: 15 requests per minute, 1,500 per day, 1 million per month
- Output prices for cached prompts remain the same as standard (no discount on output)






