Gemini API 批次模式的設計宗旨,是能以標準費用的 50% 非同步處理大量要求。目標處理時間為 24 小時,但大多數情況下,處理速度會快得多。
如果不需要立即取得回應,可以針對大規模非緊急工作使用批次模式,例如資料前處理或執行評估。
建立批次工作
您有兩種方式可以在批次模式中提交要求:
- 內嵌要求:直接包含在批次建立要求中的
GenerateContentRequest
物件清單。這適用於總要求大小不超過 20MB 的較小批次。模型傳回的 output 是inlineResponse
物件的清單。 - 輸入檔案:JSON Lines (JSONL) 檔案,每行包含一個完整的
GenerateContentRequest
物件。建議您對較大的要求使用這個方法。模型傳回的輸出內容為 JSONL 檔案,每行都是GenerateContentResponse
或狀態物件。
內嵌要求
如果要求數量不多,可以直接在 BatchGenerateContentRequest
中嵌入 GenerateContentRequest
物件。以下範例會使用內嵌要求呼叫 BatchGenerateContent
方法:
Python
from google import genai
from google.genai import types
client = genai.Client()
# A list of dictionaries, where each is a GenerateContentRequest
inline_requests = [
{
'contents': [{
'parts': [{'text': 'Tell me a one-sentence joke.'}],
'role': 'user'
}]
},
{
'contents': [{
'parts': [{'text': 'Why is the sky blue?'}],
'role': 'user'
}]
}
]
inline_batch_job = client.batches.create(
model="models/gemini-2.5-flash",
src=inline_requests,
config={
'display_name': "inlined-requests-job-1",
},
)
print(f"Created batch job: {inline_batch_job.name}")
REST
curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-X POST \
-H "Content-Type:application/json" \
-d '{
"batch": {
"display_name": "my-batch-requests",
"input_config": {
"requests": {
"requests": [
{
"request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]},
"metadata": {
"key": "request-1"
}
},
{
"request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]},
"metadata": {
"key": "request-2"
}
}
]
}
}
}
}'
輸入檔案
如要處理大量要求,請準備 JSON Lines (JSONL) 檔案。這個檔案的每一行都必須是 JSON 物件,內含使用者定義的鍵和要求物件,其中要求是有效的 GenerateContentRequest
物件。回應中會使用使用者定義的鍵,指出哪個輸出內容是哪個要求的結果。舉例來說,如果要求定義的金鑰為 request-1
,則回應會以相同的金鑰名稱註解。
這個檔案是使用 File API 上傳。輸入檔案的大小上限為 2 GB。
以下是 JSONL 檔案範例。您可以將其儲存到名為 my-batch-requests.json
的檔案:
{"key": "request-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generation_config": {"temperature": 0.7}}}
{"key": "request-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}}
與內嵌要求類似,您可以在每個要求 JSON 中指定其他參數,例如系統指令、工具或其他設定。
如以下範例所示,您可以使用 File API 上傳這個檔案。如果您使用多模態輸入內容,可以在 JSONL 檔案中參照其他上傳的檔案。
Python
from google import genai
from google.genai import types
client = genai.Client()
# Create a sample JSONL file
with open("my-batch-requests.jsonl", "w") as f:
requests = [
{"key": "request-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}},
{"key": "request-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}}
]
for req in requests:
f.write(json.dumps(req) + "\n")
# Upload the file to the File API
uploaded_file = client.files.upload(
file='my-batch-requests.jsonl',
config=types.UploadFileConfig(display_name='my-batch-requests', mime_type='jsonl')
)
print(f"Uploaded file: {uploaded_file.name}")
REST
tmp_batch_input_file=batch_input.tmp
echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp
MIME_TYPE=$(file -b --mime-type "${tmp_batch_input_file}")
NUM_BYTES=$(wc -c < "${tmp_batch_input_file}")
DISPLAY_NAME=BatchInput
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files \
-D "${tmp_header_file}" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/jsonl" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${tmp_batch_input_file}" 2> /dev/null > file_info.json
file_uri=$(jq ".file.uri" file_info.json)
以下範例會使用 File API 上傳的輸入檔案,呼叫 BatchGenerateContent
方法:
Python
# Assumes `uploaded_file` is the file object from the previous step
file_batch_job = client.batches.create(
model="gemini-2.5-flash",
src=uploaded_file.name,
config={
'display_name': "file-upload-job-1",
},
)
print(f"Created batch job: {file_batch_job.name}")
REST
BATCH_INPUT_FILE='files/123456' # File ID
curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \
-X POST \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" \
-d "{
'batch': {
'display_name': 'my-batch-requests',
'input_config': {
'requests': {
'file_name': ${BATCH_INPUT_FILE}
}
}
}
}"
建立批次工作時,系統會傳回工作名稱。您可以使用這個名稱監控工作狀態,並在工作完成後擷取結果。
以下是包含工作名稱的輸出範例:
Created batch job from file: batches/123456789
要求設定
您可以加入在標準非批次要求中使用的任何要求設定。例如,你可以指定溫度、系統指令,甚至傳遞其他模式。以下範例顯示內嵌要求,其中包含其中一項要求的系統指令:
inline_requests_list = [
{'contents': [{'parts': [{'text': 'Write a short poem about a cloud.'}]}]},
{'contents': [{'parts': [{'text': 'Write a short poem about a cat.'}]}], 'system_instructions': {'parts': [{'text': 'You are a cat. Your name is Neko.'}]}}
]
同樣地,您也可以指定要求要使用的工具。以下範例顯示啟用 Google 搜尋工具的要求:
inline_requests_list = [
{'contents': [{'parts': [{'text': 'Who won the euro 1998?'}]}]},
{'contents': [{'parts': [{'text': 'Who won the euro 2025?'}]}], 'tools': [{'google_search ': {}}]}
]
您也可以指定結構化輸出。 以下範例說明如何為批次要求指定這項設定。
from google import genai
from pydantic import BaseModel, TypeAdapter
class Recipe(BaseModel):
recipe_name: str
ingredients: list[str]
client = genai.Client()
# A list of dictionaries, where each is a GenerateContentRequest
inline_requests = [
{
'contents': [{
'parts': [{'text': 'List a few popular cookie recipes, and include the amounts of ingredients.'}],
'role': 'user'
}],
'config': {
'response_mime_type': 'application/json',
'response_schema': list[Recipe]
}
},
{
'contents': [{
'parts': [{'text': 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.'}],
'role': 'user'
}],
'config': {
'response_mime_type': 'application/json',
'response_schema': list[Recipe]
}
}
]
inline_batch_job = client.batches.create(
model="models/gemini-2.5-flash",
src=inline_requests,
config={
'display_name': "structured-output-job-1"
},
)
# wait for the job to finish
job_name = inline_batch_job.name
print(f"Polling status for job: {job_name}")
while True:
batch_job_inline = client.batches.get(name=job_name)
if batch_job_inline.state.name in ('JOB_STATE_SUCCEEDED', 'JOB_STATE_FAILED', 'JOB_STATE_CANCELLED', 'JOB_STATE_EXPIRED'):
break
print(f"Job not finished. Current state: {batch_job_inline.state.name}. Waiting 30 seconds...")
time.sleep(30)
print(f"Job finished with state: {batch_job_inline.state.name}")
# print the response
for i, inline_response in enumerate(batch_job_inline.dest.inlined_responses):
print(f"\n--- Response {i+1} ---")
# Check for a successful response
if inline_response.response:
# The .text property is a shortcut to the generated text.
print(inline_response.response.text)
監控工作狀態
建立批次作業時,請使用取得的作業名稱輪詢作業狀態。 批次工作的狀態欄位會顯示目前的狀態。批次作業可能處於下列其中一種狀態:
JOB_STATE_PENDING
:工作已建立,正在等待服務處理。JOB_STATE_RUNNING
:工作正在進行中。JOB_STATE_SUCCEEDED
:作業已順利完成。現在可以擷取結果。JOB_STATE_FAILED
:工作失敗。詳情請參閱錯誤詳細資料。JOB_STATE_CANCELLED
:使用者已取消工作。JOB_STATE_EXPIRED
:工作已過期,因為工作已執行或待處理超過 48 小時。這項工作不會有任何結果可供擷取。 您可以嘗試再次提交工作,或將要求分成較小的批次。
您可以定期輪詢工作狀態,確認工作是否完成。
Python
# Use the name of the job you want to check
# e.g., inline_batch_job.name from the previous step
job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id')
batch_job = client.batches.get(name=job_name)
completed_states = set([
'JOB_STATE_SUCCEEDED',
'JOB_STATE_FAILED',
'JOB_STATE_CANCELLED',
'JOB_STATE_EXPIRED',
])
print(f"Polling status for job: {job_name}")
batch_job = client.batches.get(name=job_name) # Initial get
while batch_job.state.name not in completed_states:
print(f"Current state: {batch_job.state.name}")
time.sleep(30) # Wait for 30 seconds before polling again
batch_job = client.batches.get(name=job_name)
print(f"Job finished with state: {batch_job.state.name}")
if batch_job.state.name == 'JOB_STATE_FAILED':
print(f"Error: {batch_job.error}")
正在擷取結果
工作狀態顯示批次工作成功後,結果就會顯示在 response
欄位中。
Python
import json
# Use the name of the job you want to check
# e.g., inline_batch_job.name from the previous step
job_name = "YOUR_BATCH_JOB_NAME"
batch_job = client.batches.get(name=job_name)
if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
# If batch job was created with a file
if batch_job.dest and batch_job.dest.file_name:
# Results are in a file
result_file_name = batch_job.dest.file_name
print(f"Results are in file: {result_file_name}")
print("Downloading result file content...")
file_content = client.files.download(file=result_file_name)
# Process file_content (bytes) as needed
print(file_content.decode('utf-8'))
# If batch job was created with inline request
elif batch_job.dest and batch_job.dest.inlined_responses:
# Results are inline
print("Results are inline:")
for i, inline_response in enumerate(batch_job.dest.inlined_responses):
print(f"Response {i+1}:")
if inline_response.response:
# Accessing response, structure may vary.
try:
print(inline_response.response.text)
except AttributeError:
print(inline_response.response) # Fallback
elif inline_response.error:
print(f"Error: {inline_response.error}")
else:
print("No results found (neither file nor inline).")
else:
print(f"Job did not succeed. Final state: {batch_job.state.name}")
if batch_job.error:
print(f"Error: {batch_job.error}")
REST
BATCH_NAME="batches/123456" # Your batch job name
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" 2> /dev/null > batch_status.json
if jq -r '.done' batch_status.json | grep -q "false"; then
echo "Batch has not finished processing"
fi
batch_state=$(jq -r '.metadata.state' batch_status.json)
if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]]; then
if [[ $(jq '.response | has("inlinedResponses")' batch_status.json) = "true" ]]; then
jq -r '.response.inlinedResponses' batch_status.json
exit
fi
responses_file_name=$(jq -r '.response.responsesFile' batch_status.json)
curl https://generativelanguage.googleapis.com/download/v1beta/$responses_file_name:download?alt=media \
-H "x-goog-api-key: $GEMINI_API_KEY" 2> /dev/null
elif [[ $batch_state = "JOB_STATE_FAILED" ]]; then
jq '.error' batch_status.json
elif [[ $batch_state == "JOB_STATE_CANCELLED" ]]; then
echo "Batch was cancelled by the user"
elif [[ $batch_state == "JOB_STATE_EXPIRED" ]]; then
echo "Batch expired after 48 hours"
fi
取消批次工作
您可以使用名稱取消進行中的批次工作。工作取消後,系統會停止處理新要求。
Python
# Cancel a batch job
client.batches.cancel(name=batch_job_to_cancel.name)
REST
BATCH_NAME="batches/123456" # Your batch job name
# Cancel the batch
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME:cancel \
-H "x-goog-api-key: $GEMINI_API_KEY" \
# Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type:application/json" 2> /dev/null | jq -r '.metadata.state'
刪除批次工作
您可以使用現有批次作業的名稱刪除該作業。刪除工作後,系統會停止處理新要求,並從批次工作清單中移除該工作。
Python
# Delete a batch job
client.batches.delete(name=batch_job_to_delete.name)
REST
BATCH_NAME="batches/123456" # Your batch job name
# Delete the batch job
curl https://generativelanguage.googleapis.com/v1beta/$BATCH_NAME:delete \
-H "x-goog-api-key: $GEMINI_API_KEY"
技術詳細資料
- 支援的模型:批次模式支援一系列 Gemini 模型。 如要瞭解各模型是否支援批次模式,請參閱模型頁面。批次模式支援的模態與互動式 (或非批次模式) API 支援的模態相同。
- 價格:批次模式的用量費用為同等模型標準互動式 API 費用的 50%。詳情請參閱定價頁面。如要詳細瞭解這項功能的速率限制,請參閱速率限制頁面。
- 服務等級目標 (SLO):批次工作的設計目標是在 24 小時內完成。視工作大小和目前系統負載而定,許多工作可能會更快完成。
- 快取:已為批次要求啟用脈絡快取。如果批次中的要求導致快取命中,快取權杖的價格與非批次模式流量相同。
最佳做法
- 針對大量要求使用輸入檔案:如要處理大量要求,請務必使用檔案輸入方法,以便妥善管理,並避免達到
BatchGenerateContent
呼叫本身的要求大小限制。請注意,每個輸入檔案的大小上限為 2 GB。 - 錯誤處理:作業完成後,請檢查
batchStats
是否有failedRequestCount
。如果使用檔案輸出,請剖析每一行,檢查是否為GenerateContentResponse
或狀態物件,指出該特定要求的錯誤。如需完整的錯誤代碼清單,請參閱疑難排解指南。 - 只提交一次工作:批次工作的建立作業並非冪等。如果重複傳送相同的建立要求,系統會建立兩個不同的批次工作。
- 將大型批次作業拆開:雖然目標處理時間為 24 小時,但實際處理時間可能會因系統負載和作業大小而異。如果需要盡快取得中繼結果,建議將大型工作拆成較小的批次。
後續步驟
如需更多範例,請參閱批次模式筆記本。