批次模式

Gemini API 批次模式的設計宗旨,是能以標準費用的 50% 非同步處理大量要求。目標處理時間為 24 小時,但大多數情況下,處理速度會快得多。

如果不需要立即取得回應,可以針對大規模非緊急工作使用批次模式,例如資料前處理或執行評估。

建立批次工作

您有兩種方式可以在批次模式中提交要求:

  • 內嵌要求直接包含在批次建立要求中的 GenerateContentRequest 物件清單。這適用於總要求大小不超過 20MB 的較小批次。模型傳回的 outputinlineResponse 物件的清單。
  • 輸入檔案JSON Lines (JSONL) 檔案,每行包含一個完整的 GenerateContentRequest 物件。建議您對較大的要求使用這個方法。模型傳回的輸出內容為 JSONL 檔案,每行都是 GenerateContentResponse 或狀態物件。

內嵌要求

如果要求數量不多,可以直接在 BatchGenerateContentRequest 中嵌入 GenerateContentRequest 物件。以下範例會使用內嵌要求呼叫 BatchGenerateContent 方法:

Python


from google import genai
from google.genai import types

client = genai.Client()

# A list of dictionaries, where each is a GenerateContentRequest
inline_requests = [
    {
        'contents': [{
            'parts': [{'text': 'Tell me a one-sentence joke.'}],
            'role': 'user'
        }]
    },
    {
        'contents': [{
            'parts': [{'text': 'Why is the sky blue?'}],
            'role': 'user'
        }]
    }
]

inline_batch_job = client.batches.create(
    model="models/gemini-2.5-flash",
    src=inline_requests,
    config={
        'display_name': "inlined-requests-job-1",
    },
)

print(f"Created batch job: {inline_batch_job.name}")

REST

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-X POST \
-H "Content-Type:application/json" \
-d '{
    "batch": {
        "display_name": "my-batch-requests",
        "input_config": {
            "requests": {
                "requests": [
                    {
                        "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]},
                        "metadata": {
                            "key": "request-1"
                        }
                    },
                    {
                        "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]},
                        "metadata": {
                            "key": "request-2"
                        }
                    }
                ]
            }
        }
    }
}'

輸入檔案

如要處理大量要求,請準備 JSON Lines (JSONL) 檔案。這個檔案的每一行都必須是 JSON 物件,內含使用者定義的鍵和要求物件,其中要求是有效的 GenerateContentRequest 物件。回應中會使用使用者定義的鍵,指出哪個輸出內容是哪個要求的結果。舉例來說,如果要求定義的金鑰為 request-1,則回應會以相同的金鑰名稱註解。

這個檔案是使用 File API 上傳。輸入檔案的大小上限為 2 GB。

以下是 JSONL 檔案範例。您可以將其儲存到名為 my-batch-requests.json 的檔案:

{"key": "request-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generation_config": {"temperature": 0.7}}}
{"key": "request-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}}

與內嵌要求類似,您可以在每個要求 JSON 中指定其他參數,例如系統指令、工具或其他設定。

如以下範例所示,您可以使用 File API 上傳這個檔案。如果您使用多模態輸入內容,可以在 JSONL 檔案中參照其他上傳的檔案。

Python


from google import genai
from google.genai import types

client = genai.Client()

# Create a sample JSONL file
with open("my-batch-requests.jsonl", "w") as f:
    requests = [
        {"key": "request-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}},
        {"key": "request-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}}
    ]
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Upload the file to the File API
uploaded_file = client.files.upload(
    file='my-batch-requests.jsonl',
    config=types.UploadFileConfig(display_name='my-batch-requests', mime_type='jsonl')
)

print(f"Uploaded file: {uploaded_file.name}")

REST

tmp_batch_input_file=batch_input.tmp
echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp
MIME_TYPE=$(file -b --mime-type "${tmp_batch_input_file}")
NUM_BYTES=$(wc -c < "${tmp_batch_input_file}")
DISPLAY_NAME=BatchInput

tmp_header_file=upload-header.tmp

# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files \
-D "${tmp_header_file}" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/jsonl" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null

upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"

# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${tmp_batch_input_file}" 2> /dev/null > file_info.json

file_uri=$(jq ".file.uri" file_info.json)

以下範例會使用 File API 上傳的輸入檔案,呼叫 BatchGenerateContent 方法:

Python


# Assumes `uploaded_file` is the file object from the previous step
file_batch_job = client.batches.create(
    model="gemini-2.5-flash",
    src=uploaded_file.name,
    config={
        'display_name': "file-upload-job-1",
    },
)

print(f"Created batch job: {file_batch_job.name}")

REST

BATCH_INPUT_FILE='files/123456' # File ID
curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \
-X POST \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" \
-d "{
    'batch': {
        'display_name': 'my-batch-requests',
        'input_config': {
            'requests': {
                'file_name': ${BATCH_INPUT_FILE}
            }
        }
    }
}"

建立批次工作時,系統會傳回工作名稱。您可以使用這個名稱監控工作狀態,並在工作完成後擷取結果

以下是包含工作名稱的輸出範例:


Created batch job from file: batches/123456789

要求設定

您可以加入在標準非批次要求中使用的任何要求設定。例如,你可以指定溫度、系統指令,甚至傳遞其他模式。以下範例顯示內嵌要求,其中包含其中一項要求的系統指令:

inline_requests_list = [
    {'contents': [{'parts': [{'text': 'Write a short poem about a cloud.'}]}]},
    {'contents': [{'parts': [{'text': 'Write a short poem about a cat.'}]}], 'system_instructions': {'parts': [{'text': 'You are a cat. Your name is Neko.'}]}}
]

同樣地,您也可以指定要求要使用的工具。以下範例顯示啟用 Google 搜尋工具的要求:

inline_requests_list = [
    {'contents': [{'parts': [{'text': 'Who won the euro 1998?'}]}]},
    {'contents': [{'parts': [{'text': 'Who won the euro 2025?'}]}], 'tools': [{'google_search ': {}}]}
]

您也可以指定結構化輸出。 以下範例說明如何為批次要求指定這項設定。

from google import genai
from pydantic import BaseModel, TypeAdapter

class Recipe(BaseModel):
    recipe_name: str
    ingredients: list[str]

client = genai.Client()

# A list of dictionaries, where each is a GenerateContentRequest
inline_requests = [
    {
        'contents': [{
            'parts': [{'text': 'List a few popular cookie recipes, and include the amounts of ingredients.'}],
            'role': 'user'
        }],
        'config': {
            'response_mime_type': 'application/json',
            'response_schema': list[Recipe]
        }
    },
    {
        'contents': [{
            'parts': [{'text': 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.'}],
            'role': 'user'
        }],
        'config': {
            'response_mime_type': 'application/json',
            'response_schema': list[Recipe]
        }
    }
]

inline_batch_job = client.batches.create(
    model="models/gemini-2.5-flash",
    src=inline_requests,
    config={
        'display_name': "structured-output-job-1"
    },
)

# wait for the job to finish
job_name = inline_batch_job.name
print(f"Polling status for job: {job_name}")

while True:
    batch_job_inline = client.batches.get(name=job_name)
    if batch_job_inline.state.name in ('JOB_STATE_SUCCEEDED', 'JOB_STATE_FAILED', 'JOB_STATE_CANCELLED', 'JOB_STATE_EXPIRED'):
        break
    print(f"Job not finished. Current state: {batch_job_inline.state.name}. Waiting 30 seconds...")
    time.sleep(30)

print(f"Job finished with state: {batch_job_inline.state.name}")

# print the response
for i, inline_response in enumerate(batch_job_inline.dest.inlined_responses):
    print(f"\n--- Response {i+1} ---")

    # Check for a successful response
    if inline_response.response:
        # The .text property is a shortcut to the generated text.
        print(inline_response.response.text)

監控工作狀態

建立批次作業時,請使用取得的作業名稱輪詢作業狀態。 批次工作的狀態欄位會顯示目前的狀態。批次作業可能處於下列其中一種狀態:

  • JOB_STATE_PENDING:工作已建立,正在等待服務處理。
  • JOB_STATE_RUNNING:工作正在進行中。
  • JOB_STATE_SUCCEEDED:作業已順利完成。現在可以擷取結果。
  • JOB_STATE_FAILED:工作失敗。詳情請參閱錯誤詳細資料。
  • JOB_STATE_CANCELLED:使用者已取消工作。
  • JOB_STATE_EXPIRED:工作已過期,因為工作已執行或待處理超過 48 小時。這項工作不會有任何結果可供擷取。 您可以嘗試再次提交工作,或將要求分成較小的批次。

您可以定期輪詢工作狀態,確認工作是否完成。

Python


# Use the name of the job you want to check
# e.g., inline_batch_job.name from the previous step
job_name = "YOUR_BATCH_JOB_NAME"  # (e.g. 'batches/your-batch-id')
batch_job = client.batches.get(name=job_name)

completed_states = set([
    'JOB_STATE_SUCCEEDED',
    'JOB_STATE_FAILED',
    'JOB_STATE_CANCELLED',
    'JOB_STATE_EXPIRED',
])

print(f"Polling status for job: {job_name}")
batch_job = client.batches.get(name=job_name) # Initial get
while batch_job.state.name not in completed_states:
  print(f"Current state: {batch_job.state.name}")
  time.sleep(30) # Wait for 30 seconds before polling again
  batch_job = client.batches.get(name=job_name)

print(f"Job finished with state: {batch_job.state.name}")
if batch_job.state.name == 'JOB_STATE_FAILED':
    print(f"Error: {batch_job.error}")

正在擷取結果

工作狀態顯示批次工作成功後,結果就會顯示在 response 欄位中。

Python

import json

# Use the name of the job you want to check
# e.g., inline_batch_job.name from the previous step
job_name = "YOUR_BATCH_JOB_NAME"
batch_job = client.batches.get(name=job_name)

if batch_job.state.name == 'JOB_STATE_SUCCEEDED':

    # If batch job was created with a file
    if batch_job.dest and batch_job.dest.file_name:
        # Results are in a file
        result_file_name = batch_job.dest.file_name
        print(f"Results are in file: {result_file_name}")

        print("Downloading result file content...")
        file_content = client.files.download(file=result_file_name)
        # Process file_content (bytes) as needed
        print(file_content.decode('utf-8'))

    # If batch job was created with inline request
    elif batch_job.dest and batch_job.dest.inlined_responses:
        # Results are inline
        print("Results are inline:")
        for i, inline_response in enumerate(batch_job.dest.inlined_responses):
            print(f"Response {i+1}:")
            if inline_response.response:
                # Accessing response, structure may vary.
                try:
                    print(inline_response.response.text)
                except AttributeError:
                    print(inline_response.response) # Fallback
            elif inline_response.error:
                print(f"Error: {inline_response.error}")
    else:
        print("No results found (neither file nor inline).")
else:
    print(f"Job did not succeed. Final state: {batch_job.state.name}")
    if batch_job.error:
        print(f"Error: {batch_job.error}")

REST

BATCH_NAME="batches/123456" # Your batch job name

curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" 2> /dev/null > batch_status.json

if jq -r '.done' batch_status.json | grep -q "false"; then
    echo "Batch has not finished processing"
fi

batch_state=$(jq -r '.metadata.state' batch_status.json)
if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]]; then
    if [[ $(jq '.response | has("inlinedResponses")' batch_status.json) = "true" ]]; then
        jq -r '.response.inlinedResponses' batch_status.json
        exit
    fi
    responses_file_name=$(jq -r '.response.responsesFile' batch_status.json)
    curl https://generativelanguage.googleapis.com/download/v1beta/$responses_file_name:download?alt=media \
    -H "x-goog-api-key: $GEMINI_API_KEY" 2> /dev/null
elif [[ $batch_state = "JOB_STATE_FAILED" ]]; then
    jq '.error' batch_status.json
elif [[ $batch_state == "JOB_STATE_CANCELLED" ]]; then
    echo "Batch was cancelled by the user"
elif [[ $batch_state == "JOB_STATE_EXPIRED" ]]; then
    echo "Batch expired after 48 hours"
fi

取消批次工作

您可以使用名稱取消進行中的批次工作。工作取消後,系統會停止處理新要求。

Python

# Cancel a batch job
client.batches.cancel(name=batch_job_to_cancel.name)

REST

BATCH_NAME="batches/123456" # Your batch job name

# Cancel the batch
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME:cancel \
-H "x-goog-api-key: $GEMINI_API_KEY" \

# Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" 2> /dev/null | jq -r '.metadata.state'

刪除批次工作

您可以使用現有批次作業的名稱刪除該作業。刪除工作後,系統會停止處理新要求,並從批次工作清單中移除該工作。

Python

# Delete a batch job
client.batches.delete(name=batch_job_to_delete.name)

REST

BATCH_NAME="batches/123456" # Your batch job name

# Delete the batch job
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME:delete \
-H "x-goog-api-key: $GEMINI_API_KEY"

技術詳細資料

  • 支援的模型:批次模式支援一系列 Gemini 模型。 如要瞭解各模型是否支援批次模式,請參閱模型頁面。批次模式支援的模態與互動式 (或非批次模式) API 支援的模態相同。
  • 價格:批次模式的用量費用為同等模型標準互動式 API 費用的 50%。詳情請參閱定價頁面。如要詳細瞭解這項功能的速率限制,請參閱速率限制頁面
  • 服務等級目標 (SLO):批次工作的設計目標是在 24 小時內完成。視工作大小和目前系統負載而定,許多工作可能會更快完成。
  • 快取:已為批次要求啟用脈絡快取。如果批次中的要求導致快取命中,快取權杖的價格與非批次模式流量相同。

最佳做法

  • 針對大量要求使用輸入檔案:如要處理大量要求,請務必使用檔案輸入方法,以便妥善管理,並避免達到 BatchGenerateContent 呼叫本身的要求大小限制。請注意,每個輸入檔案的大小上限為 2 GB。
  • 錯誤處理:作業完成後,請檢查 batchStats 是否有 failedRequestCount。如果使用檔案輸出,請剖析每一行,檢查是否為 GenerateContentResponse 或狀態物件,指出該特定要求的錯誤。如需完整的錯誤代碼清單,請參閱疑難排解指南
  • 只提交一次工作:批次工作的建立作業並非冪等。如果重複傳送相同的建立要求,系統會建立兩個不同的批次工作。
  • 將大型批次作業拆開:雖然目標處理時間為 24 小時,但實際處理時間可能會因系統負載和作業大小而異。如果需要盡快取得中繼結果,建議將大型工作拆成較小的批次。

後續步驟

如需更多範例,請參閱批次模式筆記本