Media resolution
The media_resolution parameter controls how the Gemini API processes media inputs like images, videos, and PDF documents by determining the maximum number of tokens allocated for media inputs, allowing you to balance response quality against latency and cost. For different settings, default values and how they correspond to tokens, see the Token counts section.
You can configure media resolution for individual media objects (content items) within your request (Gemini 3 only).
Per-content-item media resolution (Gemini 3 only)
Gemini 3 allows you to set media resolution for individual media objects within your request, offering fine-grained optimisation of token usage. You can mix resolution levels in a single request. For example, using high resolution for a complex diagram and low resolution for a simple contextual image.
Python
from google import genai
from google.genai import types
client = genai.Client()
myfile = client.files.upload(file="path/to/image.jpg")
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Describe this image:"},
{
"type": "image",
"uri": myfile.uri,
"mime_type": myfile.mime_type,
"resolution": "high"
}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const myfile = await ai.files.upload({
file: "path/to/image.jpg",
config: { mimeType: "image/jpeg" },
});
const interaction = await ai.interactions.create({
model: "gemini-3-flash-preview",
input: [
{ type: "text", text: "Describe this image:" },
{
type: "image",
uri: myfile.uri,
mimeType: myfile.mimeType,
resolution: "high"
}
],
});
console.log(interaction.steps.at(-1).content[0].text);
}
await main();
REST
# First upload the file using the Files API, then use the URI:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{"type": "text", "text": "Describe this image:"},
{
"type": "image",
"uri": "YOUR_FILE_URI",
"mime_type": "image/jpeg",
"resolution": "high"
}
]
}'
Available resolution values
The Gemini API defines the following levels for media resolution:
unspecified: The default setting. The token count for this level varies significantly between Gemini 3 and earlier Gemini models.low: Lower token count, resulting in faster processing and lower cost, but with less detail.medium: A balance between detail, cost, and latency.high: Higher token count, providing more detail for the model to work with, at the expense of increased latency and cost.ultra_high(Per content item only): Highest token count, required for specific use cases such as computer use.
Note that high provides the optimal performance for most use cases.
The exact number of tokens generated for each of these levels depends on both the media type (Image, Video, PDF) and the model version.
Token counts
The tables below summarize the approximate token counts for each media_resolution value and media type per model family.
Gemini 3 models
| MediaResolution | Image | Video | |
|---|---|---|---|
unspecified (Default) |
1120 | 70 | 560 |
low |
280 | 70 | 280 + Native Text |
medium |
560 | 70 | 560 + Native Text |
high |
1120 | 280 | 1120 + Native Text |
ultra_high |
2240 | N/A | N/A |
Choosing the right resolution
- Default (
unspecified): Start with the default. It's tuned for a good balance of quality, latency, and cost for most common use cases. low: Use for scenarios where cost and latency are paramount, and fine-grained detail is less critical.medium/high: Increase the resolution when the task requires understanding intricate details within the media. This is often needed for complex visual analysis, chart reading, or dense document comprehension.ultra_high- Only available for per content item setting. Recommended for specific use cases such as computer use or where testing shows a clear enhancement overhigh.- Per-content-item control (Gemini 3): Optimizes token usage. For example, in a prompt with multiple images, use
highfor a complex diagram andlowormediumfor simpler contextual images.
Recommended settings
The following lists the recommended media resolution settings for each supported media type.
| Media Type | Recommended Setting | Max Tokens | Usage Guidance |
|---|---|---|---|
| Images | high |
1120 | Recommended for most image analysis tasks to ensure maximum quality. |
| PDFs | medium |
560 | Optimal for document understanding; quality typically saturates at medium. Increasing to high rarely improves OCR results for standard documents. |
| Video (General) | low (or medium) |
70 (per frame) | Note: For video, low and medium settings are treated identically (70 tokens) to optimize context usage. This is sufficient for most action recognition and description tasks. |
| Video (Text-heavy) | high |
280 (per frame) | Required only when the use case involves reading dense text (OCR) or small details within video frames. |
Always test and evaluate the impact of different resolution settings on your application to find the best trade-off between quality, latency, and cost.
Version compatibility summary
- Setting the
resolutionon individual content items is exclusive to Gemini 3 models.
Next steps
- Learn more about the multimodal capabilities of Gemini API in the image understanding, video understanding and document understanding guides.