Gemini 2.0 Flash 現已準備好投入實際使用！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

瞭解 Gemini API 的文件處理功能

Gemini API 支援 PDF 輸入內容，包括長文件 (最多 3600 頁)。Gemini 模型會使用原生視覺技術處理 PDF，因此能夠理解文件中的文字和圖片內容。透過原生 PDF 視覺支援功能，Gemini 模型可執行以下操作：

分析文件中的圖表、圖表和表格。
將資訊擷取至結構化輸出格式。
回答文件中圖像和文字內容相關問題。
摘錄文件重點。
將文件內容轉錄成 HTML 等格式，並保留版面配置和格式，以便在後續應用程式 (例如 RAG 管道) 中使用。

本教學課程將示範幾種可能的使用方式，說明如何在 PDF 文件中使用 Gemini API。所有輸出內容皆為文字。

事前準備：設定專案和 API 金鑰

在呼叫 Gemini API 之前，您需要設定專案並設定 API 金鑰。

展開查看如何設定專案和 API 金鑰

取得並保護 API 金鑰

您需要 API 金鑰才能呼叫 Gemini API。如果還沒有金鑰，請在 Google AI Studio 建立。

取得 API 金鑰

強烈建議您不要將 API 金鑰登錄到版本管控系統。

您應將 API 金鑰儲存在 Google Cloud Secret Manager 等密鑰儲存庫中。

本教學課程假設您是以環境變數的形式存取 API 金鑰。

安裝 SDK 套件並設定 API 金鑰

Gemini API 的 Python SDK 包含在 google-genai 套件中。

使用 pip 安裝依附元件：
```
pip install -U google-genai
```
將 API 金鑰放入 GOOGLE_API_KEY 環境變數：
```
export GOOGLE_API_KEY="YOUR_KEY_HERE"
```

建立 API Client，它會從環境中挑選金鑰：

from google import genai

client = genai.Client()

使用 PDF 提示

本指南將示範如何使用 File API 上傳及處理 PDF，或將 PDF 納入內嵌資料。

技術詳細資料

Gemini 1.5 Pro 和 1.5 Flash 最多支援 3,600 個文件頁面。文件頁面必須採用下列其中一種文字資料 MIME 類型：

PDF - application/pdf
JavaScript - application/x-javascript、text/javascript
Python - application/x-python, text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
Markdown - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

每個文件頁面相當於 258 個符記。

除了模型的內容視窗之外，文件中的像素數量並無特定限制，較大的網頁會縮放至 3072 x 3072 的最大解析度，同時保留原始的顯示比例，較小的網頁則會縮放至 768 x 768 像素。除了頻寬，較小尺寸的網頁不會降低成本，也不會提高較高解析度的網頁效能。

為確保最佳成效：

上傳前請先將頁面旋轉至正確方向。
避免模糊的頁面。
如果使用單一頁面，請將文字提示放在頁面後方。

PDF 輸入

如果 PDF 酬載小於 20 MB，您可以選擇上傳以 base64 編碼的文件，或直接上傳儲存在本機的檔案。

做為內嵌資料

您可以直接從網址處理 PDF 文件。以下是如何執行這項操作的程式碼片段：

from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

本機儲存的 PDF

如要處理本機儲存的 PDF，您可以使用下列方法：

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
filepath = pathlib.Path('file.pdf')
filepath.write_bytes(httpx.get(doc_url).content)

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

大型 PDF

您可以使用 File API 上傳任何大小的文件。如果總要求大小 (包括檔案、文字提示、系統指示等) 超過 20 MB，請一律使用 File API。

呼叫 media.upload，使用 File API 上傳檔案。以下程式碼會上傳文件檔案，然後在對 models.generateContent 的呼叫中使用該檔案。

從網址下載的大型 PDF 檔案

針對可透過網址存取的大型 PDF 檔案，使用 File API，簡化直接透過網址上傳及處理這些文件的程序：

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  path=doc_io, 
  config=dict(
    # It will guess the mime type from the file extension, but if you pass
    # a file-like object, you need to set the
    mime_type='application/pdf')
)

prompt = "Summarize this document"


response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_doc, prompt])
print(response.text)

儲存在本機的大型 PDF 檔案

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve the PDF
file_path = pathlib.Path('A17.pdf')
file_path.write_bytes(httpx.get(long_context_pdf_path).content)

# Upload the PDF using the File API
sample_file = client.files.upload(
  path=file_path,
)

prompt="Summarize this document"

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_file, "Summarize this document"])
print(response.text)

您可以呼叫 files.get，驗證 API 是否已成功儲存上傳的檔案，並取得其中繼資料。只有 name (以及擴充的 uri) 是唯一的。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

file_info = client.files.get(file.name)
print(file_info.model_dump_json(indent=4))

多個 PDF

只要文件和文字提示的總大小仍在模型的脈絡視窗內，Gemini API 就能在單一要求中處理多個 PDF 文件。

from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

可列出檔案

您可以使用 files.list 列出所有使用 File API 上傳的檔案，以及這些檔案的 URI。

from google import genai

client = genai.Client()

print("My files:")
for f in client.files.list():
    print("  ", f.name)

刪除檔案

使用 File API 上傳的檔案會在 2 天後自動刪除。您也可以使用 files.delete 手動刪除這些資料。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

client.files.delete(file.name)

使用 PDF 進行脈絡快取

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

document = client.files.upload(
  path=doc_io,
  config=dict(mime_type='application/pdf')
)

# Specify the model name and system instruction for caching
model_name = "gemini-1.5-flash-002" # Ensure this matches the model you intend to use
system_instruction = "You are an expert analyzing transcripts."

# Create a cached content object
cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      system_instruction=system_instruction,
      contents=[document], # The document(s) and other content you wish to cache
    )
)

# Display the cache details
print(f'{cache=}')

# Generate content using the cached prompt and document
response = client.models.generate_content(
  model=model_name,
  contents="Please summarize this transcript",
  config=types.GenerateContentConfig(
    cached_content=cache.name
  ))

# (Optional) Print usage metadata for insights into the API call
print(f'{response.usage_metadata=}')

# Print the generated text
print('\n\n', response.text)

列出快取

您無法擷取或查看快取內容，但可以擷取快取中繼資料 (name、model、display_name、usage_metadata、create_time、update_time 和 expire_time)。

如要列出所有已上傳快取項目的中繼資料，請使用 CachedContent.list()：

from google import genai

client = genai.Client()
for c in client.caches.list():
  print(c)

更新快取

您可以為快取設定新的 ttl 或 expire_time。系統不支援變更快取的其他內容。

以下範例說明如何使用 CachedContent.update() 更新快取的 ttl。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.update(
  name = cache.name,
  config=types.UpdateCachedContentConfig(
    ttl=f'{datetime.timedelta(hours=2).total_seconds()}s'
  )
)

刪除快取

快取服務提供刪除作業，可用於手動從快取中移除內容。以下範例說明如何使用 CachedContent.delete() 刪除快取。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.delete(name = cache.name)

後續步驟

本指南說明如何使用 generateContent，並從已處理的文件產生文字輸出內容。如要進一步瞭解相關內容，請參閱下列資源：

檔案提示策略：Gemini API 支援使用文字、圖片、音訊和影片資料提示，這也稱為多模態提示。
系統指示：系統指示可讓您根據特定需求和用途，引導模型的行為。
安全指南：生成式 AI 模型有時會產生非預期的輸出內容，例如不準確、偏頗或令人反感的輸出內容。後續處理和人工評估是限制這類輸出內容造成危害風險的必要措施。