Memahami dan menghitung token

Gemini dan model AI generatif lainnya memproses input dan output pada perincian yang disebut token.

Untuk model Gemini, satu token setara dengan sekitar 4 karakter. 100 token setara dengan sekitar 60-80 kata dalam bahasa Inggris.

Tentang token

Token dapat berupa karakter tunggal seperti z atau seluruh kata seperti cat. Kata-kata panjang dipecah menjadi beberapa token. Kumpulan semua token yang digunakan oleh model disebut kosakata, dan proses membagi teks menjadi token disebut tokenisasi.

Jika penagihan diaktifkan, biaya panggilan ke Gemini API sebagian ditentukan oleh jumlah token input dan output, jadi mengetahui cara menghitung token dapat membantu.

Menjumlahkan token

Semua input ke dan output dari Gemini API di-tokenisasi, termasuk teks, file gambar, dan modalitas non-teks lainnya.

Anda dapat menghitung token dengan cara berikut:

  • Panggil count_tokens dengan input permintaan. Menampilkan jumlah total token dalam input saja. Lakukan panggilan ini sebelum mengirim input untuk memeriksa ukuran permintaan Anda.

  • Gunakan usage pada respons interaksi. Menampilkan jumlah token untuk input (total_input_tokens), output (total_output_tokens), pemikiran (total_thought_tokens), konten yang di-cache (total_cached_tokens), penggunaan alat (total_tool_use_tokens), dan total (total_tokens).

Menghitung token teks

Python

# This will only work for SDK newer than 2.0.0
from google import genai

client = genai.Client()
prompt = "The quick brown fox jumps over the lazy dog."

# Count tokens before sending
total_tokens = client.models.count_tokens(
    model="gemini-3-flash-preview",
    contents=prompt
)
print("total_tokens:", total_tokens)

# Get usage from interaction
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=prompt
)
print(interaction.usage)

JavaScript

// This will only work for SDK newer than 2.0.0
import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});
const prompt = "The quick brown fox jumps over the lazy dog.";

// Count tokens before sending
const countResponse = await client.models.countTokens({
    model: "gemini-3-flash-preview",
    contents: prompt,
});
console.log(countResponse.totalTokens);

// Get usage from interaction
const interaction = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: prompt,
});
console.log(interaction.usage);

REST

# Specifies the API revision to avoid breaking changes when they become default
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:countTokens" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Api-Revision: 2026-05-20" \
  -d '{"contents": [{"parts": [{"text": "The quick brown fox."}]}]}'

Menghitung token multi-giliran

Menghitung token di seluruh histori percakapan menggunakan previous_interaction_id:

Python

# This will only work for SDK newer than 2.0.0
# First interaction
interaction1 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Hi, my name is Bob"
)

# Second interaction continues the conversation
interaction2 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What's my name?",
    previous_interaction_id=interaction1.id
)

# Usage includes tokens from both turns
print(f"Input tokens: {interaction2.usage.total_input_tokens}")
print(f"Output tokens: {interaction2.usage.total_output_tokens}")
print(f"Total tokens: {interaction2.usage.total_tokens}")

JavaScript

// This will only work for SDK newer than 2.0.0
// First interaction
const interaction1 = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: "Hi, my name is Bob"
});

// Second interaction continues the conversation
const interaction2 = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: "What's my name?",
    previous_interaction_id: interaction1.id
});

console.log(`Input tokens: ${interaction2.usage.total_input_tokens}`);
console.log(`Output tokens: ${interaction2.usage.total_output_tokens}`);

Menghitung token multimodal

Semua input ke Gemini API di-tokenisasi, termasuk gambar, video, dan audio. Poin penting tentang tokenisasi:

  • Gambar: Gambar ≤384 piksel di kedua dimensi dihitung sebagai 258 token. Gambar yang lebih besar diatur menjadi ubin 768x768 piksel, yang masing-masing dihitung sebagai 258 token.
  • Video: 263 token per detik
  • Audio: 32 token per detik

Token gambar

Python

# This will only work for SDK newer than 2.0.0
uploaded_file = client.files.upload(file="path/to/image.jpg")

# Count tokens for image + text
total_tokens = client.models.count_tokens(
    model="gemini-3-flash-preview",
    contents=["Tell me about this image", uploaded_file]
)
print(f"Total tokens: {total_tokens}")

# Generate with image
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Tell me about this image"},
        {"type": "image", "uri": uploaded_file.uri, "mime_type": uploaded_file.mime_type}
    ]
)
print(interaction.usage)

JavaScript

// This will only work for SDK newer than 2.0.0
const uploadedFile = await client.files.upload({
    file: "path/to/image.jpg",
    config: { mimeType: "image/jpeg" }
});

// Count tokens
const countResponse = await client.models.countTokens({
    model: "gemini-3-flash-preview",
    contents: [
        { text: "Tell me about this image" },
        { fileData: { fileUri: uploadedFile.uri, mimeType: uploadedFile.mimeType } }
    ]
});
console.log(countResponse.totalTokens);

Contoh data inline:

Python

# This will only work for SDK newer than 2.0.0
import base64

with open('image.jpg', 'rb') as f:
    image_bytes = f.read()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Describe this image"},
        {
            "type": "image",
            "data": base64.b64encode(image_bytes).decode('utf-8'),
            "mime_type": "image/jpeg"
        }
    ]
)
print(interaction.usage)

Token video

Python

# This will only work for SDK newer than 2.0.0
import time

video_file = client.files.upload(file="path/to/video.mp4")

while not video_file.state or video_file.state.name != "ACTIVE":
    print("Processing video...")
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

# A 60-second video is approximately 263 * 60 = 15,780 tokens
total_tokens = client.models.count_tokens(
    model="gemini-3-flash-preview",
    contents=["Summarize this video", video_file]
)
print(f"Total tokens: {total_tokens}")

# Generate with video
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Summarize this video"},
        {"type": "video", "uri": video_file.uri, "mime_type": video_file.mime_type}
    ]
)
print(interaction.usage)

Token audio

Python

# This will only work for SDK newer than 2.0.0
audio_file = client.files.upload(file="path/to/audio.mp3")

# A 60-second audio clip is approximately 32 * 60 = 1,920 tokens
total_tokens = client.models.count_tokens(
    model="gemini-3-flash-preview",
    contents=["Transcribe this audio", audio_file]
)
print(f"Total tokens: {total_tokens}")

# Generate with audio
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Transcribe this audio"},
        {"type": "audio", "uri": audio_file.uri, "mime_type": audio_file.mime_type}
    ]
)
print(interaction.usage)

Menghitung token petunjuk sistem

Petunjuk sistem dihitung sebagai bagian dari token input:

Python

# This will only work for SDK newer than 2.0.0
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Hello!",
    system_instruction="You are a helpful assistant who speaks like a pirate."
)

# system_instruction tokens included in total_input_tokens
print(f"Input tokens: {interaction.usage.total_input_tokens}")

Menghitung token alat

Alat (fungsi, eksekusi kode, Google Penelusuran) juga dihitung:

Python

# This will only work for SDK newer than 2.0.0
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
]

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What's the weather in Tokyo?",
    tools=tools
)

print(f"Input tokens: {interaction.usage.total_input_tokens}")
print(f"Tool use tokens: {interaction.usage.total_tool_use_tokens}")

Jendela konteks

Setiap model Gemini memiliki jumlah token maksimum yang dapat ditanganinya. Jendela konteks menentukan batas gabungan token input dan output.

Mendapatkan ukuran jendela konteks secara terprogram

Python

# This will only work for SDK newer than 2.0.0
model_info = client.models.get(model="gemini-3-flash-preview")
print(f"Input token limit: {model_info.input_token_limit}")
print(f"Output token limit: {model_info.output_token_limit}")

JavaScript

// This will only work for SDK newer than 2.0.0
const modelInfo = await client.models.get({ model: "gemini-3-flash-preview" });
console.log(`Input token limit: ${modelInfo.inputTokenLimit}`);
console.log(`Output token limit: ${modelInfo.outputTokenLimit}`);

Temukan ukuran jendela konteks di halaman model.

Langkah berikutnya