Understand and count tokens
Gemini and other generative AI models process input and output at a granularity called a token.
For Gemini models, a token is equivalent to about 4 characters. 100 tokens is equal to about 60-80 English words.
About tokens
Tokens can be single characters like z or whole words like cat. Long words
are broken up into several tokens. The set of all tokens used by the model is
called the vocabulary, and the process of splitting text into tokens is called
tokenization.
When billing is enabled, the cost of a call to the Gemini API is determined in part by the number of input and output tokens, so knowing how to count tokens can be helpful.
Count tokens
All input to and output from the Gemini API is tokenized, including text, image files, and other non-text modalities.
You can count tokens in the following ways:
Call
count_tokenswith the input of the request. Returns the total number of tokens in the input only. Make this call before sending input to check the size of your requests.Use the
usageon the interaction response. Returns token counts for input (total_input_tokens), output (total_output_tokens), thinking (total_thought_tokens), cached content (total_cached_tokens), tool use (total_tool_use_tokens), and total (total_tokens).
Count text tokens
Python
from google import genai
client = genai.Client()
prompt = "The quick brown fox jumps over the lazy dog."
# Count tokens before sending
total_tokens = client.models.count_tokens(
model="gemini-3-flash-preview",
contents=prompt
)
print("total_tokens:", total_tokens)
# Get usage from interaction
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=prompt
)
print(interaction.usage)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const prompt = "The quick brown fox jumps over the lazy dog.";
// Count tokens before sending
const countResponse = await client.models.countTokens({
model: "gemini-3-flash-preview",
contents: prompt,
});
console.log(countResponse.totalTokens);
// Get usage from interaction
const interaction = await client.interactions.create({
model: "gemini-3-flash-preview",
input: prompt,
});
console.log(interaction.usage);
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:countTokens" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"contents": [{"parts": [{"text": "The quick brown fox."}]}]}'
Count multi-turn tokens
Count tokens across conversation history using previous_interaction_id:
Python
# First interaction
interaction1 = client.interactions.create(
model="gemini-3-flash-preview",
input="Hi, my name is Bob"
)
# Second interaction continues the conversation
interaction2 = client.interactions.create(
model="gemini-3-flash-preview",
input="What's my name?",
previous_interaction_id=interaction1.id
)
# Usage includes tokens from both turns
print(f"Input tokens: {interaction2.usage.total_input_tokens}")
print(f"Output tokens: {interaction2.usage.total_output_tokens}")
print(f"Total tokens: {interaction2.usage.total_tokens}")
JavaScript
// First interaction
const interaction1 = await client.interactions.create({
model: "gemini-3-flash-preview",
input: "Hi, my name is Bob"
});
// Second interaction continues the conversation
const interaction2 = await client.interactions.create({
model: "gemini-3-flash-preview",
input: "What's my name?",
previousInteractionId: interaction1.id
});
console.log(`Input tokens: ${interaction2.usage.totalInputTokens}`);
console.log(`Output tokens: ${interaction2.usage.totalOutputTokens}`);
Count multimodal tokens
All input to the Gemini API is tokenized, including images, video, and audio. Key points about tokenization:
- Images: Images ≤384 pixels in both dimensions count as 258 tokens. Larger images are tiled into 768x768 pixel tiles, each counting as 258 tokens.
- Video: 263 tokens per second
- Audio: 32 tokens per second
Image tokens
Python
uploaded_file = client.files.upload(file="path/to/image.jpg")
# Count tokens for image + text
total_tokens = client.models.count_tokens(
model="gemini-3-flash-preview",
contents=["Tell me about this image", uploaded_file]
)
print(f"Total tokens: {total_tokens}")
# Generate with image
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Tell me about this image"},
{"type": "image", "uri": uploaded_file.uri, "mime_type": uploaded_file.mime_type}
]
)
print(interaction.usage)
JavaScript
const uploadedFile = await client.files.upload({
file: "path/to/image.jpg",
config: { mimeType: "image/jpeg" }
});
// Count tokens
const countResponse = await client.models.countTokens({
model: "gemini-3-flash-preview",
contents: [
{ text: "Tell me about this image" },
{ fileData: { fileUri: uploadedFile.uri, mimeType: uploadedFile.mimeType } }
]
});
console.log(countResponse.totalTokens);
Inline data example:
Python
import base64
with open('image.jpg', 'rb') as f:
image_bytes = f.read()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Describe this image"},
{
"type": "image",
"data": base64.b64encode(image_bytes).decode('utf-8'),
"mime_type": "image/jpeg"
}
]
)
print(interaction.usage)
Video tokens
Python
import time
video_file = client.files.upload(file="path/to/video.mp4")
while not video_file.state or video_file.state.name != "ACTIVE":
print("Processing video...")
time.sleep(5)
video_file = client.files.get(name=video_file.name)
# A 60-second video is approximately 263 * 60 = 15,780 tokens
total_tokens = client.models.count_tokens(
model="gemini-3-flash-preview",
contents=["Summarize this video", video_file]
)
print(f"Total tokens: {total_tokens}")
# Generate with video
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Summarize this video"},
{"type": "video", "uri": video_file.uri, "mime_type": video_file.mime_type}
]
)
print(interaction.usage)
Audio tokens
Python
audio_file = client.files.upload(file="path/to/audio.mp3")
# A 60-second audio clip is approximately 32 * 60 = 1,920 tokens
total_tokens = client.models.count_tokens(
model="gemini-3-flash-preview",
contents=["Transcribe this audio", audio_file]
)
print(f"Total tokens: {total_tokens}")
# Generate with audio
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "text", "text": "Transcribe this audio"},
{"type": "audio", "uri": audio_file.uri, "mime_type": audio_file.mime_type}
]
)
print(interaction.usage)
Count system instruction tokens
System instructions are counted as part of the input tokens:
Python
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="Hello!",
system_instruction="You are a helpful assistant who speaks like a pirate."
)
# system_instruction tokens included in total_input_tokens
print(f"Input tokens: {interaction.usage.total_input_tokens}")
Count tool tokens
Tools (functions, code execution, Google Search) are also counted:
Python
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
]
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="What's the weather in Tokyo?",
tools=tools
)
print(f"Input tokens: {interaction.usage.total_input_tokens}")
print(f"Tool use tokens: {interaction.usage.total_tool_use_tokens}")
Context window
Each Gemini model has a maximum number of tokens it can handle. The context window defines the combined limit of input and output tokens.
Get context window size programmatically
Python
model_info = client.models.get(model="gemini-3-flash-preview")
print(f"Input token limit: {model_info.input_token_limit}")
print(f"Output token limit: {model_info.output_token_limit}")
JavaScript
const modelInfo = await client.models.get({ model: "gemini-3-flash-preview" });
console.log(`Input token limit: ${modelInfo.inputTokenLimit}`);
console.log(`Output token limit: ${modelInfo.outputTokenLimit}`);
Find context window sizes on the models page.
What's next
- Text generation: Generation basics
- Caching: Reduce costs with caching
- Pricing: Understand costs