This guide provides an introduction to tokens and explains how to calculate Gemini API token usage. A related Colab tutorial is also available.

View on Run in Google Colab View source on GitHub

About tokens

Gemini and other generative AI models process input and output at a granularity that is smaller than a word but larger than a single character or code-point: a token.

Tokens can be single characters like z or whole words like the. Long words might be broken up into several tokens. The set of all tokens used by the model is called the vocabulary, and the process of splitting text into tokens is called tokenization.

For Gemini models, a token is equivalent to about 4 characters.

When billing is enabled, the price of a paid request is determined by the number of input and output tokens, so knowing how to count tokens can be helpful.

Context windows

The models available through the Gemini API have context windows that are measured in tokens. The context window defines how much input you can provide and how much output the model can generate. You can determine the size of the context window through the API or in the models documentation.

In the following example you can see that the gemini-1.0-pro-latest model has an input of 30k tokens and an output of 2k tokens, giving a total context window of 32k tokens.

model_info = genai.get_model('models/gemini-1.0-pro-latest')
(model_info.input_token_limit, model_info.output_token_limit)
# (30720, 2048)

You can also use a model like gemini-1.5-pro-latest, which has a 1M token context window.

Count tokens

The Gemini API provides an endpoint for counting the number of tokens in a request: GenerativeModel.count_tokens. You pass the same arguments as you would to GenerativeModel.generate_content and the service returns the number of tokens in that request.

Text tokens

Here's an example of counting text tokens:

model = genai.GenerativeModel('models/gemini-1.0-pro-latest')
model.count_tokens("The quick brown fox jumps over the lazy dog.")
# total_tokens: 10

When you call GenerativeModel.generate_content (or ChatSession.send_message), the response object has a usage_metadata attribute containing both the input and output token counts (prompt_token_count and candidates_token_count):

response = model.generate_content("The quick brown fox jumps over the lazy dog.")
# 'This sentence is an example of a pangram, which is a sentence that contains all of the letters of the alphabet.'
# prompt_token_count: 10
# candidates_token_count: 24

Multi-turn tokens

Multi-turn conversational (chat) objects work similarly.

chat = model.start_chat(history=[{'role':'user', 'parts':'Hi my name is Bob'},  {'role':'model', 'parts':'Hi Bob!'}])
# total_tokens: 10

To understand how big your next conversational turn will be, you need to append it to the history when you call count_tokens.

from google.generativeai.types.content_types import to_contents
model.count_tokens(chat.history + to_contents('What is the meaning of life?'))
# total_tokens: 17

Multi-modal tokens

All input to the API is tokenized, including images and other non-text modalities.

Inline content

Media objects can be sent to the API inline with the request:

your_image = # get image ...
model.count_tokens(['Tell me about this image', your_image])
# total_tokens: 263

Files API

The model sees identical tokens if you upload parts of the prompt through the File API instead:

your_image_upload = genai.upload_file('image.jpg')
model.count_tokens(['Tell me about this image', your_image_upload])
# total_tokens: 263

Media token counts

Internally, images are a fixed size, so they consume a fixed number of tokens, regardless of display size.

image1 = # get image...
image2 = # get image...

# (2048, 1362)
# total_tokens: 258

# (1068, 906)
# total_tokens: 258

Audio and video are each converted to tokens at a fixed rate of tokens per minute.

audio_sample = genai.upload_file('sample.mp3')
# total_tokens: 83552

System instructions and tools

System instructions and tools also count towards the total, as shown in these examples:

    .count_tokens("The quick brown fox jumps over the lazy dog.")
# total_tokens: 10
    .GenerativeModel(system_instruction='Talk like a pirate!')
    .count_tokens("The quick brown fox jumps over the lazy dog.")
# total_tokens: 15
def add(a:float, b:float):
    """returns a + b."""
    return a+b

def subtract(a:float, b:float):
    """returns a - b."""
    return a-b

def multiply(a:float, b:float):
    """returns a * b."""
    return a*b

def divide(a:float, b:float):
    """returns a / b."""
    return a*b

model = genai.GenerativeModel(model_name='gemini-1.0-pro',
                              tools=[add, subtract, multiply, divide])
model.count_tokens("The quick brown fox jumps over the lazy dog.")
# total_tokens: 194

Further reading

For more on token counting, check out the API reference.