Gemini and other generative AI models process input and output at a granularity called a token.
This guide explains how to get the context windows of specific models, as well as how to count tokens for use cases like text input, chat, multimodal input, and system instructions and tools.
About tokens
Tokens can be single characters like z
or whole words like cat
. Long words
are broken up into several tokens. The set of all tokens used by the model is
called the vocabulary, and the process of splitting text into tokens is called
tokenization.
When billing is enabled, the cost of a call to the Gemini API is determined in part by the number of input and output tokens, so knowing how to count tokens can be helpful.
Try out counting tokens in a Colab
View on ai.google.dev | Run in Google Colab | View source on GitHub |
Context windows
The models available through the Gemini API have context windows that are measured in tokens. The context window defines how much input you can provide and how much output the model can generate. You can determine the size of the context window using the API or by looking in the models documentation.
In the following example, you can see that the gemini-1.0-pro-001
model has an
input limit of about 30K tokens and an output limit of about 2K tokens, which
means a context window of about 32K tokens.
model_info = genai.get_model('models/gemini-1.0-pro-001')
(model_info.input_token_limit, model_info.output_token_limit)
# input_token_limit: 30720
# output_token_limit: 2048
As another example, if you instead requested the token limits for a model like
gemini-1.5-flash-001
, you'd see that it has a 2M context window.
Count tokens
All input to and output from the Gemini API is tokenized, including text, image files, and other non-text modalities.
You can count tokens in the following ways:
Call
count_tokens
with the input of the request.
This returns the total number of tokens in the input only. You can make this call before sending the input to the model to check the size of your requests.Use the
usage_metadata
attribute on theresponse
object after callinggenerate_content
.
This returns the total number of tokens in both the input and the output (specifically,prompt_token_count
(input tokens) andcandidates_token_count
(output tokens)).
Count text tokens
Calling count_tokens
with a text-only input returns a total_tokens
value
that is the token count of the text in the input only.
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "The quick brown fox jumps over the lazy dog."
print(model.count_tokens(prompt))
# total_tokens: 10
If you call generate_content
and use the usage_metadata
attribute on the
response
object, you can count the total number of tokens in
both the input and the output. This attribute returns both
prompt_token_count
(input tokens) and candidates_token_count
(output tokens).
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "The quick brown fox jumps over the lazy dog."
response = model.generate_content(prompt)
print(response.text)
print(response.usage_metadata)
# text output: 'This sentence is an example of a pangram, which is a sentence that contains all of the letters of the alphabet.'
# prompt_token_count: 10
# candidates_token_count: 24
Count multi-turn (chat) tokens
Calling count_tokens
with the chat history returns a total_tokens
value that
is the token count of the text from each role in the chat.
model = genai.GenerativeModel('models/gemini-1.5-flash')
chat = model.start_chat(history=[{'role':'user', 'parts':'Hi my name is Bob'}, {'role':'model', 'parts':'Hi Bob!'}])
model.count_tokens(chat.history)
# total_tokens: 10
To understand how big your next conversational turn will be, you need to append
it to the history when you call count_tokens
.
model.count_tokens(chat.history + [{'role':'user', 'parts':['What is the meaning of life?']}])
# total_tokens: 17
Count multimodal tokens
All input to the Gemini API is tokenized, including text, image files, and other non-text modalities. Note the following high-level key points about tokenization of multimodal input during processing by the Gemini API:
Images are considered to be a fixed size, so they consume a fixed number of tokens (currently 258 tokens), regardless of their display or file size.
Video and audio files are converted to tokens at the following fixed rates: video at 263 tokens per second and audio at 32 tokens per second.
Image files
During processing, the Gemini API considers images to be a fixed size, so they consume a fixed number of tokens (currently 258 tokens), regardless of their display or file size.
Calling count_tokens
with a text-and-image input returns a total_tokens
value that is the combined token count of the text and the image in
the input only.
Note that you'll get the same total_tokens
value regardless of whether you use
a file uploaded using the File API or you provide the file as inline data.
Example that uses an uploaded image from the File API:
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "Tell me about this image"
# An image's token count is always 258 tokens (regardless of its display or file size).
your_image_file = genai.upload_file(path="image.jpg")
print(model.count_tokens([prompt, your_image_file]))
# total_tokens: 263
Example that provides the image as inline data:
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "Tell me about this image"
# An image's token count is always 258 tokens (regardless of its display or file size).
your_image_file = # get image ...
model.count_tokens([prompt, your_image_file])
# total_tokens: 263
Video or audio files
Audio and video are each converted to tokens at the following fixed rates:
- Video: 263 tokens per second
- Audio: 32 tokens per second
Calling count_tokens
with a text-and-video/audio input returns a
total_tokens
value that is the combined token count of the text and the
video/audio file in the input only.
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "Tell me about this video"
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
your_media_file = genai.upload_file(path="sample.mp4")
print(model.count_tokens([prompt, your_media_file]))
# total_tokens: 83552
System instructions and tools
System instructions and tools also count towards the total token count for the input.
In this example, only a text prompt is included in the request. Note the
total_tokens
count.
model = genai.GenerativeModel('models/gemini-1.5-flash')
prompt = "The quick brown fox jumps over the lazy dog."
model.count_tokens(prompt)
# total_tokens: 10
If you use system instructions, the total_tokens
count increases to reflect
the addition of system_instruction
.
model = genai.GenerativeModel(model_name='gemini-1.5-flash',
system_instruction='Talk like a pirate!')
prompt = "The quick brown fox jumps over the lazy dog."
model.count_tokens(prompt)
# total_tokens: 15
If you use function calling, the total_tokens
count increases to reflect the
addition of tools
.
def add(a:float, b:float):
"""returns a + b."""
return a+b
def subtract(a:float, b:float):
"""returns a - b."""
return a-b
def multiply(a:float, b:float):
"""returns a * b."""
return a*b
def divide(a:float, b:float):
"""returns a / b."""
return a*b
model = genai.GenerativeModel(model_name='gemini-1.5-flash',
tools=[add, subtract, multiply, divide])
prompt = "The quick brown fox jumps over the lazy dog."
model.count_tokens(prompt)
# total_tokens: 194
Further reading
For more on token counting, check out the API reference.
countTokens
(REST)count_tokens
(Python)