了解词元并计算词元数量

Gemini 和其他生成式 AI 模型会以一种称为“token”的粒度处理输入和输出。

对于 Gemini 模型，一个 token 大致相当于 4 个字符。 100 个 token 大约相当于 60-80 个英文单词。

令牌简介

词元可以是单个字符（例如 z），也可以是整个字词（例如 cat）。长字词会被拆分为多个 token。模型使用的所有 token 的集合称为词汇，将文本拆分为 token 的过程称为分词。

启用结算功能后，对 Gemini API 的调用的费用部分取决于输入和输出令牌的数量，因此了解如何计算令牌数量会很有帮助。

在 Colab 中试用词元计数功能

您可以使用 Colab 尝试统计令牌。

在 ai.google.dev 上查看

试用 Colab 笔记本

在 GitHub 上查看笔记本

上下文窗口

通过 Gemini API 提供的模型具有以 token 衡量的上下文窗口。上下文窗口定义了您可以提供的输入量以及模型可以生成的输出量。您可以通过调用 getModels 端点或查看模型文档来确定上下文窗口的大小。

在以下示例中，您可以看到 gemini-2.0-flash 模型的输入限制约为 100 万个 token，输出限制约为 8,000 个 token，这意味着上下文窗口为 100 万个 token。

from google import genai

client = genai.Client()
model_info = client.models.get(model="gemini-2.0-flash")
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( e.g., input_token_limit=30720, output_token_limit=2048 )count_tokens.py

统计 token 数量

Gemini API 的所有输入和输出（包括文本、图片文件和其他非文本模态）都会进行分词。

您可以通过以下方式统计令牌数量：

统计文本 token

from google import genai

client = genai.Client()
prompt = "The quick brown fox jumps over the lazy dog."

# Count tokens using the new client method.
total_tokens = client.models.count_tokens(
    model="gemini-2.0-flash", contents=prompt
)
print("total_tokens: ", total_tokens)
# ( e.g., total_tokens: 10 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=prompt
)

# The usage_metadata provides detailed token counts.
print(response.usage_metadata)
# ( e.g., prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

统计多轮（聊天）对话的 token 数量

from google import genai
from google.genai import types

client = genai.Client()

chat = client.chats.create(
    model="gemini-2.0-flash",
    history=[
        types.Content(
            role="user", parts=[types.Part(text="Hi my name is Bob")]
        ),
        types.Content(role="model", parts=[types.Part(text="Hi Bob!")]),
    ],
)
# Count tokens for the chat history.
print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=chat.get_history()
    )
)
# ( e.g., total_tokens: 10 )

response = chat.send_message(
    message="In one sentence, explain how a computer works to a young child."
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

# You can count tokens for the combined history and a new message.
extra = types.UserContent(
    parts=[
        types.Part(
            text="What is the meaning of life?",
        )
    ]
)
history = chat.get_history()
history.append(extra)
print(client.models.count_tokens(model="gemini-2.0-flash", contents=history))
# ( e.g., total_tokens: 56 )count_tokens.py

统计多模态 token

Gemini API 的所有输入内容（包括文本、图片文件和其他非文本模态）都会被标记化。请注意以下关于 Gemini API 在处理多模态输入期间进行分词的高级要点：

对于 Gemini 2.0，如果图片输入的两个维度均小于或等于 384 像素，则计为 258 个 token。如果图片在某个或两个维度上较大，则会根据需要将其剪裁和缩放为 768x768 像素的图块，每个图块计为 258 个 token。在 Gemini 2.0 之前，图片使用固定的 258 个 token。
视频和音频文件会按以下固定费率转换为 token：视频为每秒 263 个 token，音频为每秒 32 个 token。

媒体分辨率

Gemini 3 Pro 预览版通过 media_resolution 参数引入了对多模态视觉处理的精细控制。media_resolution 参数用于确定为每个输入图片或视频帧分配的 token 数量上限。分辨率越高，模型读取细小文字或识别细微细节的能力就越强，但 token 用量和延迟时间也会增加。

如需详细了解该参数及其对令牌计算的影响，请参阅媒体分辨率指南。

图片文件

使用 File API 上传的图片的示例：

from google import genai

client = genai.Client()
prompt = "Tell me about this image"
your_image_file = client.files.upload(file=media / "organ.jpg")

print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_image_file]
    )
)
# ( e.g., total_tokens: 263 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_image_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

以内嵌数据形式提供图片的示例：

from google import genai
import PIL.Image

client = genai.Client()
prompt = "Tell me about this image"
your_image_file = PIL.Image.open(media / "organ.jpg")

# Count tokens for combined text and inline image.
print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_image_file]
    )
)
# ( e.g., total_tokens: 263 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_image_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

视频或音频文件

音频和视频会分别按以下固定费率转换为 token：

视频：每秒 263 个 token
音频：每秒 32 个 token

from google import genai
import time

client = genai.Client()
prompt = "Tell me about this video"
your_file = client.files.upload(file=media / "Big_Buck_Bunny.mp4")

# Poll until the video file is completely processed (state becomes ACTIVE).
while not your_file.state or your_file.state.name != "ACTIVE":
    print("Processing video...")
    print("File state:", your_file.state)
    time.sleep(5)
    your_file = client.files.get(name=your_file.name)

print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_file]
    )
)
# ( e.g., total_tokens: 300 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )count_tokens.py

系统指令和工具

系统指令和工具也会计入输入的总 token 数。

如果您使用系统指令，total_tokens 数量会增加，以反映 system_instruction 的添加。

如果您使用函数调用，total_tokens 数量会增加，以反映 tools 的添加。