Gemini Deep Research is now available in preview with collaborative planning, visualization, MCP support, and more.

Context caching

Note: This version of the page covers the new Interactions API, which is currently in Beta.
For stable production deployments, we recommend you continue to use the generateContent API. You can use the toggle on this page to switch between the versions.

In a typical AI workflow, you might pass the same input tokens over and over to a model. The Gemini API offers implicit caching to optimize performance and costs.

Implicit caching

Implicit caching is enabled by default for all Gemini 2.5 and newer models. We automatically pass on cost savings if your request hits caches. There is nothing you need to do in order to enable this. The minimum input token count for context caching is listed in the following table for each model:

Model	Min token limit
Gemini 3 Flash Preview	1024
Gemini 3 Pro Preview	4096
Gemini 2.5 Flash	1024
Gemini 2.5 Pro	4096

To increase the chance of an implicit cache hit:

Try putting large and common contents at the beginning of your prompt
Try to send requests with similar prefix in a short amount of time

You can see the number of tokens which were cache hits in the response object's usage_metadata (Python) or usageMetadata (JavaScript) field.