文件解讀
Gemini 模型可處理 PDF 格式的文件,並運用原生視覺功能瞭解整份文件的內容。這項功能不僅能擷取文字,還可讓 Gemini 執行下列動作:
- 分析及解讀內容,包括文字、圖片、圖表、圖表和表格,即使是長達 1000 頁的文件也沒問題。
- 以結構化輸出格式擷取資訊。
- 根據文件中的圖像和文字元素,摘要內容並回答問題。
- 轉錄文件內容 (例如轉錄為 HTML),保留版面配置和格式,供下游應用程式使用。
您也可以傳送非 PDF 文件,但 Gemini 會將其視為一般文字,因此圖表或格式等脈絡資訊會消失。
內嵌傳遞 PDF 資料
您可以在要求中內嵌傳遞 PDF 資料。這種方法最適合用於小型文件或臨時處理,因為您不需要在後續要求中參照該檔案。如果需要參照大型文件進行多輪互動,建議使用 Files API,以縮短要求延遲時間並減少頻寬用量。
以下範例說明如何內嵌傳遞 PDF 資料:
Python
from google import genai
import base64
client = genai.Client()
with open('path/to/document.pdf', 'rb') as f:
pdf_bytes = f.read()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{
"type": "document",
"data": base64.b64encode(pdf_bytes).decode('utf-8'),
"mime_type": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
async function main() {
const pdfData = fs.readFileSync("path/to/document.pdf", {
encoding: "base64"
});
const interaction = await ai.interactions.create({
model: "gemini-3-flash-preview",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
data: pdfData,
mimeType: "application/pdf"
}
]
});
console.log(interaction.steps.at(-1).content[0].text);
}
main();
REST
PDF_PATH="path/to/document.pdf"
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
B64FLAGS="--input"
else
B64FLAGS="-w0"
fi
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{
"type": "document",
"data": "'$(base64 $B64FLAGS $PDF_PATH)'",
"mimeType": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
}'
您也可以上傳本機 PDF 檔案進行處理:
Python
from google import genai
client = genai.Client()
uploaded_file = client.files.upload(file="file.pdf")
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "document", "uri": uploaded_file.uri, "mime_type": uploaded_file.mime_type},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const uploadedFile = await ai.files.upload({
file: "file.pdf",
config: { mimeType: "application/pdf" }
});
const interaction = await ai.interactions.create({
model: "gemini-3-flash-preview",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
uri: uploadedFile.uri,
mimeType: uploadedFile.mimeType
}
]
});
console.log(interaction.steps.at(-1).content[0].text);
}
main();
使用 Files API 上傳 PDF
如果檔案較大,或是打算在多個要求中重複使用文件,建議使用 Files API。這項做法可將檔案上傳作業與模型要求分離,進而縮短要求延遲時間並減少頻寬用量。
透過網址上傳大型 PDF
使用 File API 簡化從網址上傳及處理大型 PDF 檔案的程序:
Python
from google import genai
import io
import httpx
client = genai.Client()
long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"
# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)
sample_doc = client.files.upload(
# You can pass a path or a file-like object here
file=doc_io,
config=dict(
mime_type='application/pdf')
)
prompt = "Summarize this document"
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "document", "uri": sample_doc.uri, "mime_type": sample_doc.mime_type},
{"type": "text", "text": prompt}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const pdfBuffer = await fetch("https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf")
.then((response) => response.arrayBuffer());
const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });
const file = await ai.files.upload({
file: fileBlob,
config: {
displayName: 'A17_FlightPlan.pdf',
},
});
// Wait for the file to be processed.
let getFile = await ai.files.get({ name: file.name });
while (getFile.state === 'PROCESSING') {
getFile = await ai.files.get({ name: file.name });
console.log(`current file status: ${getFile.state}`);
console.log('File is still processing, retrying in 5 seconds');
await new Promise((resolve) => {
setTimeout(resolve, 5000);
});
}
if (file.state === 'FAILED') {
throw new Error('File processing failed.');
}
const interaction = await ai.interactions.create({
model: 'gemini-3-flash-preview',
input: [
{ type: "document", uri: file.uri, mimeType: file.mimeType },
{ type: "text", text: "Summarize this document" }
],
});
console.log(interaction.steps.at(-1).content[0].text);
}
main();
REST
PDF_PATH="https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"
DISPLAY_NAME="A17_FlightPlan"
PROMPT="Summarize this document"
# Download the PDF from the provided URL
wget -O "${DISPLAY_NAME}.pdf" "${PDF_PATH}"
MIME_TYPE=$(file -b --mime-type "${DISPLAY_NAME}.pdf")
NUM_BYTES=$(wc -c < "${DISPLAY_NAME}.pdf")
echo "MIME_TYPE: ${MIME_TYPE}"
echo "NUM_BYTES: ${NUM_BYTES}"
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files?key=${GOOGLE_API_KEY}" \
-D upload-header.tmp \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${DISPLAY_NAME}.pdf" 2> /dev/null > file_info.json
file_uri=$(jq ".file.uri" file_info.json)
echo "file_uri: ${file_uri}"
# Now create an interaction using that file
curl "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{"type": "text", "text": "'$PROMPT'"},
{"type": "document", "uri": '$file_uri', "mimeType": "application/pdf"}
]
}' 2> /dev/null > response.json
cat response.json
echo
jq ".steps[-1].content[0].text" response.json
# Clean up the downloaded PDF
rm "${DISPLAY_NAME}.pdf"
儲存在本機的大型 PDF
Python
from google import genai
import pathlib
client = genai.Client()
# Upload the PDF using the File API
file_path = pathlib.Path('large_file.pdf')
sample_file = client.files.upload(
file=file_path,
)
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "document", "uri": sample_file.uri, "mime_type": sample_file.mime_type},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const file = await ai.files.upload({
file: 'path-to-localfile.pdf',
config: {
displayName: 'A17_FlightPlan.pdf',
},
});
// Wait for the file to be processed.
let getFile = await ai.files.get({ name: file.name });
while (getFile.state === 'PROCESSING') {
getFile = await ai.files.get({ name: file.name });
console.log(`current file status: ${getFile.state}`);
console.log('File is still processing, retrying in 5 seconds');
await new Promise((resolve) => {
setTimeout(resolve, 5000);
});
}
if (file.state === 'FAILED') {
throw new Error('File processing failed.');
}
const interaction = await ai.interactions.create({
model: 'gemini-3-flash-preview',
input: [
{ type: "document", uri: file.uri, mimeType: file.mimeType },
{ type: "text", text: "Summarize this document" }
],
});
console.log(interaction.steps.at(-1).content[0].text);
}
main();
REST
PDF_PATH="path/to/large_file.pdf"
NUM_BYTES=$(wc -c < "${PDF_PATH}")
DISPLAY_NAME=TEXT
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files?key=${GEMINI_API_KEY}" \
-D upload-header.tmp \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: application/pdf" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${PDF_PATH}" 2> /dev/null > file_info.json
file_uri=$(jq ".file.uri" file_info.json)
echo file_uri=$file_uri
# Now create an interaction using that file
curl "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{"type": "document", "uri": '$file_uri', "mimeType": "application/pdf"},
{"type": "text", "text": "Can you add a few more lines to this poem?"}
]
}' 2> /dev/null > response.json
cat response.json
echo
jq ".steps[-1].content[0].text" response.json
您可以呼叫 files.get,確認 API 已成功儲存上傳的檔案並取得其 metadata。只有 name (以及延伸的 uri) 是獨一無二。
Python
from google import genai
import pathlib
client = genai.Client()
fpath = pathlib.Path('example.pdf')
fpath.write_text('hello')
file = client.files.upload(file='example.pdf')
file_info = client.files.get(name=file.name)
print(file_info.model_dump_json(indent=4))
REST
name=$(jq ".file.name" file_info.json)
# Get the file of interest to check state
curl https://generativelanguage.googleapis.com/v1beta/files/$name?key=$GEMINI_API_KEY > file_info.json
# Print some information about the file you got
name=$(jq ".file.name" file_info.json)
echo name=$name
file_uri=$(jq ".file.uri" file_info.json)
echo file_uri=$file_uri
傳遞多個 PDF
只要文件和文字提示的總大小不超過模型的脈絡窗口,Gemini API 就能在單一要求中處理多個 PDF 文件 (最多 1000 頁)。
Python
from google import genai
import io
import httpx
client = genai.Client()
doc_url_1 = "https://arxiv.org/pdf/2312.11805"
doc_url_2 = "https://arxiv.org/pdf/2403.05530"
# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)
sample_pdf_1 = client.files.upload(
file=doc_data_1,
config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
file=doc_data_2,
config=dict(mime_type='application/pdf')
)
prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{"type": "document", "uri": sample_pdf_1.uri, "mime_type": sample_pdf_1.mime_type},
{"type": "document", "uri": sample_pdf_2.uri, "mime_type": sample_pdf_2.mime_type},
{"type": "text", "text": prompt}
]
)
print(interaction.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function uploadRemotePDF(url, displayName) {
const pdfBuffer = await fetch(url)
.then((response) => response.arrayBuffer());
const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });
const file = await ai.files.upload({
file: fileBlob,
config: {
displayName: displayName,
},
});
// Wait for the file to be processed.
let getFile = await ai.files.get({ name: file.name });
while (getFile.state === 'PROCESSING') {
getFile = await ai.files.get({ name: file.name });
console.log(`current file status: ${getFile.state}`);
console.log('File is still processing, retrying in 5 seconds');
await new Promise((resolve) => {
setTimeout(resolve, 5000);
});
}
if (file.state === 'FAILED') {
throw new Error('File processing failed.');
}
return file;
}
async function main() {
const file1 = await uploadRemotePDF("https://arxiv.org/pdf/2312.11805", "PDF 1");
const file2 = await uploadRemotePDF("https://arxiv.org/pdf/2403.05530", "PDF 2");
const interaction = await ai.interactions.create({
model: 'gemini-3-flash-preview',
input: [
{ type: "document", uri: file1.uri, mimeType: file1.mimeType },
{ type: "document", uri: file2.uri, mimeType: file2.mimeType },
{ type: "text", text: "What is the difference between each of the main benchmarks between these two papers? Output these in a table." }
],
});
console.log(interaction.steps.at(-1).content[0].text);
}
main();
REST
DOC_URL_1="https://arxiv.org/pdf/2312.11805"
DOC_URL_2="https://arxiv.org/pdf/2403.05530"
DISPLAY_NAME_1="Gemini_paper"
DISPLAY_NAME_2="Gemini_1.5_paper"
PROMPT="What is the difference between each of the main benchmarks between these two papers? Output these in a table."
# Function to download and upload a PDF
upload_pdf() {
local doc_url="$1"
local display_name="$2"
# Download the PDF
wget -O "${display_name}.pdf" "${doc_url}"
local MIME_TYPE=$(file -b --mime-type "${display_name}.pdf")
local NUM_BYTES=$(wc -c < "${display_name}.pdf")
echo "MIME_TYPE: ${MIME_TYPE}"
echo "NUM_BYTES: ${NUM_BYTES}"
local tmp_header_file=upload-header.tmp
# Initial resumable request
curl "https://generativelanguage.googleapis.com/upload/v1beta/files?key=${GOOGLE_API_KEY}" \
-D "${tmp_header_file}" \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${display_name}'}}" 2> /dev/null
local upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the PDF
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${display_name}.pdf" 2> /dev/null > "file_info_${display_name}.json"
local file_uri=$(jq ".file.uri" "file_info_${display_name}.json")
echo "file_uri for ${display_name}: ${file_uri}"
# Clean up the downloaded PDF
rm "${display_name}.pdf"
echo "${file_uri}"
}
# Upload the first PDF
file_uri_1=$(upload_pdf "${DOC_URL_1}" "${DISPLAY_NAME_1}")
# Upload the second PDF
file_uri_2=$(upload_pdf "${DOC_URL_2}" "${DISPLAY_NAME_2}")
# Now create an interaction using both files
curl "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{"type": "document", "uri": '$file_uri_1', "mimeType": "application/pdf"},
{"type": "document", "uri": '$file_uri_2', "mimeType": "application/pdf"},
{"type": "text", "text": "'$PROMPT'"}
]
}' 2> /dev/null > response.json
cat response.json
echo
jq ".steps[-1].content[0].text" response.json
技術詳細資料
Gemini 支援的 PDF 檔案大小上限為 50 MB 或 1000 頁。這項限制適用於內嵌資料和 Files API 上傳作業。每頁文件相當於 258 個權杖。
除了模型的內容視窗外,文件中的像素數量沒有具體限制,但較大的頁面會縮放至 3072 x 3072 像素的最大解析度,同時保留原始長寬比,較小的頁面則會放大至 768 x 768 像素。如果頁面大小較小,除了頻寬外,不會有任何成本降低;如果頁面解析度較高,也不會提升效能。
Gemini 3 模型
Gemini 3 推出 media_resolution 參數,可精細控管多模態視覺處理作業。現在可以為每個媒體元件設定低、中或高解析度。新增這項功能後,處理 PDF 文件的方式也隨之更新:
- 內建文字納入:系統會擷取 PDF 中內建的文字,並提供給模型。
- 帳單與權杖報表:
- 從 PDF 擷取的原生文字產生的詞元不會計費。
- 在 API 回應的
usage_metadata部分,處理 PDF 頁面 (以圖片形式) 時產生的權杖現在會計入IMAGE模態,而不是像某些舊版一樣計入獨立的DOCUMENT模態。
文件類型
從技術上來說,您可以傳遞其他 MIME 類型,以供文件理解功能使用,例如 TXT、Markdown、HTML、XML 等。不過,文件視覺 只會解讀 PDF 檔案的內容。其他類型會以純文字形式擷取,模型無法解讀這些檔案的算繪內容。所有檔案類型專屬內容都會遺失,例如圖表、HTML 標記、Markdown 格式等。
如要瞭解其他檔案輸入方式,請參閱「檔案輸入方式」指南。
最佳做法
為確保最佳成效:
- 上傳前,請先將頁面旋轉至正確方向。
- 避免模糊的頁面。
- 如果使用單一頁面,請將文字提示詞放在該頁面之後。
後續步驟
如要進一步瞭解相關內容,請參閱下列資源: