我们最先进的模型 Gemini 2.5 Pro Experimental 现已推出！了解详情

此页面由 Cloud Translation API 翻译。

了解词元并计算词元数量

Gemini 和其他生成式 AI 模型以称为“token”的精细程度处理输入和输出。

令牌简介

令牌可以是单个字符（例如 z），也可以是完整字词（例如 cat）。长单词会被拆分成多个词元。模型使用的所有令牌组称为词汇，将文本拆分为令牌的过程称为令牌化。

对于 Gemini 模型，一个 token 约等于 4 个字符。100 个词元相当于大约 60-80 个英语单词。

启用结算功能后，调用 Gemini API 的费用在一定程度上取决于输入和输出令牌的数量，因此了解如何统计令牌会很有帮助。

试试在 Colab 中统计词元数

您可以尝试使用 Colab 来统计令牌。

在 ai.google.dev 上查看

试用 Colab 笔记本

在 GitHub 上查看笔记本

上下文窗口

通过 Gemini API 提供的模型的上下文窗口以令牌为单位。上下文窗口定义了您可以提供多少输入以及模型可以生成多少输出。您可以通过调用 getModels 端点或查看模型文档来确定上下文窗口的大小。

在以下示例中，您可以看到 gemini-1.5-flash 模型的输入限制约为 1,000,000 个令牌，输出限制约为 8,000 个令牌，这意味着上下文窗口为 1,000,000 个令牌。

import google.generativeai as genai

model_info = genai.get_model("models/gemini-1.5-flash")

# Returns the "context window" for the model,
# which is the combined input and output token limits.
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( input_token_limit=30720, output_token_limit=2048 )count_tokens.py

统计词元数

所有输入 Gemini API 和从 Gemini API 输出的内容都会进行令牌化处理，包括文本、图片文件和其他非文本模态。

您可以通过以下方式统计令牌数：

使用请求的输入调用 count_tokens。
此函数会返回仅输入中的令牌总数。您可以在将输入发送到模型之前进行此调用，以检查请求的大小。
在调用 generate_content 后，对 response 对象使用 usage_metadata 属性。
此函数会返回输入和输出中的词元总数：total_token_count。
它还会分别返回输入和输出的词元数：prompt_token_count（输入词元）和candidates_token_count（输出词元）。

统计文本令牌数

如果您使用纯文本输入调用 count_tokens，它会返回仅输入 (total_tokens) 中的文本的令牌数。您可以在调用 generate_content 之前进行此调用，以检查请求的大小。

另一种方法是调用 generate_content，然后对 response 对象使用 usage_metadata 属性，以获取以下内容：

输入 (prompt_token_count) 和输出 (candidates_token_count) 的单独词元数
输入和输出中的总词元数 (total_token_count)

import google.generativeai as genai

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

# Call `count_tokens` to get the input token count (`total_tokens`).
print("total_tokens: ", model.count_tokens(prompt))
# ( total_tokens: 10 )

response = model.generate_content(prompt)

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

统计多轮（聊天）token

如果您使用聊天记录调用 count_tokens，则会返回聊天中每个角色的文本的总令牌数 (total_tokens)。

另一种方法是调用 send_message，然后对 response 对象使用 usage_metadata 属性，以获取以下内容：

输入 (prompt_token_count) 和输出 (candidates_token_count) 的单独词元数
输入和输出中的总词元数 (total_token_count)

如需了解下一个对话回合有多长，您需要在调用 count_tokens 时将其附加到历史记录中。

import google.generativeai as genai

model = genai.GenerativeModel("models/gemini-1.5-flash")

chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hi my name is Bob"},
        {"role": "model", "parts": "Hi Bob!"},
    ]
)
# Call `count_tokens` to get the input token count (`total_tokens`).
print(model.count_tokens(chat.history))
# ( total_tokens: 10 )

response = chat.send_message(
    "In one sentence, explain how a computer works to a young child."
)

# On the response for `send_message`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

from google.generativeai.types.content_types import to_contents

# You can call `count_tokens` on the combined history and content of the next turn.
print(model.count_tokens(chat.history + to_contents("What is the meaning of life?")))
# ( total_tokens: 56 )count_tokens.py

统计多模态词元

向 Gemini API 的所有输入都会进行令牌化处理，包括文本、图片文件和其他非文本模态。请注意以下关于 Gemini API 处理期间多模态输入令牌化的一些要点：

在 Gemini 2.0 中，如果图片输入的两个维度均小于或等于 384 像素，则计为 258 个 token。如果图片的一维或二维尺寸较大，系统会根据需要将其剪裁并缩放为 768x768 像素的图块，每个图块计为 258 个令牌。在 Gemini 2.0 之前，图片使用的是固定的 258 个令牌。
视频和音频文件会以以下固定速率转换为令牌：视频为每秒 263 个令牌，音频为每秒 32 个令牌。

图片文件

如果您使用文本和图片输入调用 count_tokens，它会在仅输入 (total_tokens) 中返回文本和图片的总令牌数。您可以在调用 generate_content 之前进行此调用，以检查请求的大小。您还可以选择分别对文本和文件调用 count_tokens。

另一种方法是调用 generate_content，然后对 response 对象使用 usage_metadata 属性，以获取以下内容：

输入 (prompt_token_count) 和输出 (candidates_token_count) 的单独词元数
输入和输出中的总词元数 (total_token_count)

使用 File API 上传的图片的示例：

import google.generativeai as genai

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = genai.upload_file(path=media / "organ.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])
response.text
# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

以内嵌数据形式提供图片的示例：

import google.generativeai as genai

import PIL.Image

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = PIL.Image.open(media / "organ.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

视频或音频文件

音频和视频分别以以下固定费率转换为代币：

视频：每秒 263 个令牌
音频：每秒 32 个令牌

如果您使用文本和视频/音频输入调用 count_tokens，它会在仅输入 (total_tokens) 中返回文本和视频/音频文件的总令牌数。您可以在调用 generate_content 之前进行此调用，以检查请求的大小。您还可以选择分别对文本和文件调用 count_tokens。

另一种方法是调用 generate_content，然后对 response 对象使用 usage_metadata 属性，以获取以下内容：

输入 (prompt_token_count) 和输出 (candidates_token_count) 的单独词元数
输入和输出中的总词元数 (total_token_count)

import google.generativeai as genai

import time

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this video"
your_file = genai.upload_file(path=media / "Big_Buck_Bunny.mp4")

# Videos need to be processed before you can use them.
while your_file.state.name == "PROCESSING":
    print("processing video...")
    time.sleep(5)
    your_file = genai.get_file(your_file.name)

# Call `count_tokens` to get the input token count
# of the combined text and video/audio file (`total_tokens`).
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_file]))
# ( total_tokens: 300 )

response = model.generate_content([prompt, your_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )
count_tokens.py

系统说明和工具

系统说明和工具也会计入输入的总令牌数。

如果您使用系统说明，total_tokens 计数会增加，以反映 system_instruction 的添加。

import google.generativeai as genai

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

print(model.count_tokens(prompt))
# total_tokens: 10

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", system_instruction="You are a cat. Your name is Neko."
)

# The total token count includes everything sent to the `generate_content` request.
# When you use system instructions, the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 21 )count_tokens.py

如果您使用函数调用，total_tokens 计数会增加，以反映 tools 的添加。

import google.generativeai as genai

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "I have 57 cats, each owns 44 mittens, how many mittens is that in total?"

print(model.count_tokens(prompt))
# ( total_tokens: 22 )

def add(a: float, b: float):
    """returns a + b."""
    return a + b

def subtract(a: float, b: float):
    """returns a - b."""
    return a - b

def multiply(a: float, b: float):
    """returns a * b."""
    return a * b

def divide(a: float, b: float):
    """returns a / b."""
    return a / b

model = genai.GenerativeModel(
    "models/gemini-1.5-flash-001", tools=[add, subtract, multiply, divide]
)

# The total token count includes everything sent to the `generate_content` request.
# When you use tools (like function calling), the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 206 )count_tokens.py