Gemini 2.0 Flash is shutting down: 2026 API pricing and migration options
Google marks Gemini 2.0 Flash and 2.0 Flash-Lite for shutdown on June 1, 2026. This guide compares 2.5 Flash-Lite, 2.5 Flash, Gemini 3 Flash, and 3.1 Flash-Lite as migration paths.
1. Why this needs attention now
Google's Gemini deprecations page
lists both gemini-2.0-flash and gemini-2.0-flash-lite with a shutdown date of
June 1, 2026. That is not a marketing label. Once a model is shut down, the endpoint
stops being available, so production apps should migrate before that date rather than waiting for errors
to show up in logs.
The direct replacements are simple on paper: move 2.0 Flash to 2.5 Flash, and 2.0 Flash-Lite to 2.5 Flash-Lite. The cost story is less simple. 2.5 Flash-Lite keeps the old 2.0 Flash price class, while 2.5 Flash is much more expensive on output tokens. Newer Gemini 3-family options add another trade-off: better long-term runway, but higher token pricing.
2. Replacement models at a glance
| Current model | Default replacement | Use when |
|---|---|---|
gemini-2.0-flash-lite | gemini-2.5-flash-lite | High-volume classification, extraction, routing, translation, and simple multimodal jobs. |
gemini-2.0-flash | gemini-2.5-flash | Chat, RAG, agent workflows, and workloads that need the 1M-token context window. |
| Planning a second migration | gemini-3.1-flash-lite | Teams that want a newer low-cost Gemini 3-family model and can afford higher token pricing. |
Gemini 3 Flash Preview and Gemini 3.5 Flash are not drop-in cost replacements for most 2.0 Flash users. Use them when quality, grounding, or newer model behavior matters more than preserving the old bill.
3. Where the pricing changes
The numbers below use Google's Gemini API pricing page as checked on May 24, 2026. Prices are USD per 1M tokens for text/image/video unless noted.
| Model | Input | Output | Cached input |
|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.025 |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | Not available |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.01 |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.03 |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $0.025 |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $0.05 |
The important jump is output price. Moving from 2.0 Flash to 2.5 Flash triples input price, but raises output price from $0.40 to $2.50. If your app produces long answers, explanations, or agent traces, the migration can be more than a small version bump.
Search grounding is a separate line item. The 2.5 Flash family keeps the older paid-tier search grounding structure: 1,500 free grounded prompts per day, then $35 per 1,000 grounded prompts. Gemini 3-family pricing uses a monthly free bucket for all Gemini 3 models, then $14 per 1,000 search queries. A single prompt can trigger more than one search query, so measure this from billing data before turning grounding on by default.
4. Pick by workload
For bulk classification, extraction, moderation, routing, and translation, start with 2.5 Flash-Lite. It is the closest practical replacement for both 2.0 Flash-Lite and many lightweight 2.0 Flash workloads.
For customer support chat and RAG, start with 2.5 Flash. The higher output price hurts, but the 1M-token context window and context caching can matter more than sticker price when you repeat long instructions, policies, or retrieved documents across many requests.
For agents and tool-use workflows, test 2.5 Flash-Lite and 2.5 Flash side by side. The right answer depends on how often the model retries, calls tools, or emits long intermediate reasoning. Gemini output pricing includes thinking tokens where the pricing page says so, so cost can move faster than visible response length.
For search-grounded answers, compare the token bill and the grounding bill separately. Gemini 3-family search pricing can be cheaper per listed unit, but the model token price is higher and each prompt may issue multiple search queries.
5. Migration checklist
- Replace the model id in one staging environment first, not in production.
- Replay real prompts and log input tokens, output tokens, thinking tokens, cache hits, and tool calls.
- Compare monthly cost with your actual input/output ratio, not only the pricing table.
- Check whether your workload depends on preview models, rate limits, or search grounding behavior.
- Run both old and new models for at least a few days before June 1, 2026, then switch traffic gradually.
If you only need a default answer: migrate 2.0 Flash-Lite to 2.5 Flash-Lite, and migrate 2.0 Flash to 2.5 Flash only after you have measured output length. If the new output bill is too high, try 2.5 Flash-Lite before jumping to a Gemini 3 model.