Gemini 2.0 Flash 现已正式发布！了解详情

此页面由 Cloud Translation API 翻译。

探索 Gemini API 的文档处理功能

Gemini API 支持 PDF 输入，包括长篇幅文档（最多 3, 600 页）。Gemini 模型使用原生视觉功能处理 PDF，因此能够理解文档中的文本和图片内容。借助原生 PDF 视觉支持，Gemini 模型能够：

分析文档中的图表、图表和表格。
将信息提取为结构化输出格式。
回答与文档中的视觉内容和文本内容相关的问题。
生成文档摘要。
转写文档内容（例如转写为 HTML），同时保留布局和格式，以便在下游应用（例如 RAG 流水线）中使用。

本教程演示了将 Gemini API 与 PDF 文档搭配使用的一些可能方式。所有输出均为文本。

开始前须知：设置项目和 API 密钥

在调用 Gemini API 之前，您需要设置项目并配置 API 密钥。

展开即可查看如何设置项目和 API 密钥

获取 API 密钥并保护其安全

您需要 API 密钥才能调用 Gemini API。如果您还没有 API 密钥，请在 Google AI Studio 中创建一个。

获取 API 密钥

强烈建议不要将 API 密钥签入版本控制系统。

您应将 API 密钥存储在 Secret 存储区（例如 Google Cloud Secret Manager）中。

本教程假定您将 API 密钥作为环境变量进行访问。

安装 SDK 软件包并配置 API 密钥

适用于 Gemini API 的 Python SDK 包含在 google-genai 软件包中。

使用 pip 安装依赖项：
```
pip install -U google-genai
```
将您的 API 密钥放入 GOOGLE_API_KEY 环境变量中：
```
export GOOGLE_API_KEY="YOUR_KEY_HERE"
```

创建一个 API Client，它将从环境中提取密钥：

from google import genai

client = genai.Client()

使用 PDF 文件提示

本指南演示了如何使用 File API 或将 PDF 作为内嵌数据包含来上传和处理 PDF。

技术详情

Gemini 1.5 Pro 和 1.5 Flash 最多支持 3,600 个文档页面。文档页面必须采用以下文本数据 MIME 类型之一：

PDF - application/pdf
JavaScript - application/x-javascript、text/javascript
Python - application/x-python、text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
Markdown - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

每页文档相当于 258 个词元。

除了模型的上下文窗口之外，文档中的像素数量没有具体限制，但较大的页面会缩小到最大分辨率 3072x3072，同时保留其原始宽高比，较小的页面会放大到 768x768 像素。除了带宽外，较小尺寸的网页不会降低费用，较高分辨率的网页也不会提升性能。

为了达到最佳效果，请注意以下事项：

请先将页面旋转到正确的方向，然后再上传。
避免页面模糊不清。
如果使用单个页面，请将文本提示放在该页面后面。

PDF 输入

对于小于 20MB 的 PDF 载荷，您可以选择上传 base64 编码的文档，也可以直接上传本地存储的文件。

作为内嵌数据

您可以直接通过网址处理 PDF 文档。以下代码段展示了如何执行此操作：

from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

本地存储的 PDF 文件

对于本地存储的 PDF 文件，您可以使用以下方法：

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
filepath = pathlib.Path('file.pdf')
filepath.write_bytes(httpx.get(doc_url).content)

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

大型 PDF 文件

您可以使用 File API 上传任何大小的文档。当请求总大小（包括文件、文本提示、系统说明等）超过 20 MB 时，请始终使用 File API。

调用 media.upload 以使用 File API 上传文件。以下代码会上传文档文件，然后在对 models.generateContent 的调用中使用该文件。

通过网址打开的大型 PDF 文件

将 File API 用于可通过网址获取的大型 PDF 文件，简化直接通过网址上传和处理这些文档的过程：

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  path=doc_io, 
  config=dict(
    # It will guess the mime type from the file extension, but if you pass
    # a file-like object, you need to set the
    mime_type='application/pdf')
)

prompt = "Summarize this document"


response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_doc, prompt])
print(response.text)

存储在本地的大型 PDF 文件

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve the PDF
file_path = pathlib.Path('A17.pdf')
file_path.write_bytes(httpx.get(long_context_pdf_path).content)

# Upload the PDF using the File API
sample_file = client.files.upload(
  path=file_path,
)

prompt="Summarize this document"

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_file, "Summarize this document"])
print(response.text)

您可以调用 files.get 来验证 API 是否已成功存储上传的文件，并获取其元数据。只有 name（以及通过扩展，uri）是唯一的。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

file_info = client.files.get(file.name)
print(file_info.model_dump_json(indent=4))

多个 PDF 文件

Gemini API 能够在单个请求中处理多个 PDF 文档，前提是文档和文本提示的总大小在模型的上下文窗口内。

from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

列出文件

您可以使用 files.list 列出使用 File API 上传的所有文件及其 URI。

from google import genai

client = genai.Client()

print("My files:")
for f in client.files.list():
    print("  ", f.name)

删除文件

使用 File API 上传的文件会在 2 天后自动删除。您也可以使用 files.delete 手动删除它们。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

client.files.delete(file.name)

使用 PDF 文件进行上下文缓存

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

document = client.files.upload(
  path=doc_io,
  config=dict(mime_type='application/pdf')
)

# Specify the model name and system instruction for caching
model_name = "gemini-1.5-flash-002" # Ensure this matches the model you intend to use
system_instruction = "You are an expert analyzing transcripts."

# Create a cached content object
cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      system_instruction=system_instruction,
      contents=[document], # The document(s) and other content you wish to cache
    )
)

# Display the cache details
print(f'{cache=}')

# Generate content using the cached prompt and document
response = client.models.generate_content(
  model=model_name,
  contents="Please summarize this transcript",
  config=types.GenerateContentConfig(
    cached_content=cache.name
  ))

# (Optional) Print usage metadata for insights into the API call
print(f'{response.usage_metadata=}')

# Print the generated text
print('\n\n', response.text)

列出缓存

您无法检索或查看缓存的内容，但可以检索缓存元数据（name、model、display_name、usage_metadata、create_time、update_time 和 expire_time）。

如需列出所有已上传缓存的元数据，请使用 CachedContent.list()：

from google import genai

client = genai.Client()
for c in client.caches.list():
  print(c)

更新缓存

您可以为缓存设置新的 ttl 或 expire_time。不支持更改缓存的任何其他内容。

以下示例展示了如何使用 CachedContent.update() 更新缓存的 ttl。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.update(
  name = cache.name,
  config=types.UpdateCachedContentConfig(
    ttl=f'{datetime.timedelta(hours=2).total_seconds()}s'
  )
)

删除缓存

缓存服务提供了删除操作，用于手动从缓存中移除内容。以下示例展示了如何使用 CachedContent.delete() 删除缓存。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.delete(name = cache.name)

后续步骤

本指南介绍了如何使用 generateContent 以及如何根据处理后的文档生成文本输出。如需了解详情，请参阅以下资源：

文件提示策略：Gemini API 支持使用文本、图片、音频和视频数据进行提示，也称为多模态提示。
系统指令：借助系统指令，您可以根据自己的特定需求和使用情形来控制模型的行为。
安全指南：生成式 AI 模型有时会生成意外的输出，例如不准确、有偏见或令人反感的输出。后处理和人工评估对于限制此类输出造成伤害的风险至关重要。