文件解讀

Gemini API 支援 PDF 輸入內容，包括長文件 (最多 1000 頁)。Gemini 模型會使用原生視覺技術處理 PDF，因此能夠解讀文件中的文字和圖像內容。透過原生 PDF 視覺支援功能，Gemini 模型可執行下列操作：

分析文件中的圖表、圖表和表格
將資訊擷取至結構化輸出格式
回答文件中圖像和文字內容的問題
生成文件摘要
將文件內容轉錄為 HTML 等格式，並保留版面配置和格式，以利於在後續應用程式中使用

本教學課程將示範幾種可能的使用方式，說明如何使用 Gemini API 處理 PDF 文件。

PDF 輸入

如果 PDF 酬載小於 20 MB，您可以選擇上傳以 base64 編碼的文件，或直接上傳本機儲存的檔案。

以內嵌資料格式

您可以直接從網址處理 PDF 文件。以下是如何執行這項操作的程式碼片段：

from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

技術詳細資料

Gemini 1.5 Pro 和 1.5 Flash 最多支援 3,600 頁的文件。文件頁面必須採用下列其中一種文字資料 MIME 類型：

PDF - application/pdf
JavaScript - application/x-javascript、text/javascript
Python - application/x-python, text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
Markdown - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

每個文件頁面相當於 258 個符記。

除了模型的內容視窗之外，文件中的像素數量並無特定限制，較大的網頁會縮放至 3072x3072 的最大解析度，同時保留原始的顯示比例，較小的網頁則會縮放至 768x768 像素。除了頻寬，較小尺寸的網頁不會降低成本，較高解析度的網頁也不會提升效能。

為確保最佳成效：

上傳前請先將頁面旋轉至正確方向。
避免模糊的頁面。
如果使用單一頁面，請將文字提示放在頁面後方。

本機儲存的 PDF

如要處理本機儲存的 PDF，您可以使用下列方法：

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"

# Retrieve and encode the PDF byte
filepath = pathlib.Path('file.pdf')
filepath.write_bytes(httpx.get(doc_url).content)

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

大型 PDF 檔案

您可以使用 File API 上傳較大的文件。如果總要求大小 (包括檔案、文字提示、系統指示等) 超過 20 MB，請一律使用 File API。

呼叫 media.upload，使用 File API 上傳檔案。以下程式碼會上傳文件檔案，然後在對 models.generateContent 的呼叫中使用該檔案。

來自網址的大型 PDF

針對可透過網址存取的大型 PDF 檔案，使用 File API，簡化直接透過網址上傳及處理這些文件的程序：

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  file=doc_io,
  config=dict(
    mime_type='application/pdf')
)

prompt = "Summarize this document"

response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[sample_doc, prompt])
print(response.text)

儲存在本機的大型 PDF 檔案

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"

# Retrieve the PDF
file_path = pathlib.Path('A17.pdf')
file_path.write_bytes(httpx.get(long_context_pdf_path).content)

# Upload the PDF using the File API
sample_file = client.files.upload(
  file=file_path,
)

prompt="Summarize this document"

response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[sample_file, "Summarize this document"])
print(response.text)

您可以呼叫 files.get，驗證 API 是否已成功儲存上傳的檔案，並取得其中繼資料。只有 name (以及擴充功能 uri) 是唯一的。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

file_info = client.files.get(file.name)
print(file_info.model_dump_json(indent=4))

多個 PDF

Gemini API 可在單一要求中處理多個 PDF 文件，只要文件和文字提示的總大小仍在模型的脈絡視窗內即可。

from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805"
doc_url_2 = "https://arxiv.org/pdf/2403.05530"

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

後續步驟

如要進一步瞭解相關內容，請參閱下列資源：

檔案提示策略：Gemini API 支援使用文字、圖片、音訊和影片資料提示，這也稱為多模態提示。
系統指令：系統指令可讓您根據特定需求和用途，決定模型的行為。