Gemini 2.0 Flash が本番環境に対応しました。詳細

このページは Cloud Translation API によって翻訳されました。

Gemini API によるドキュメント処理機能の詳細

Gemini API は、長いドキュメント（最大 3, 600 ページ）を含む PDF 入力をサポートしています。Gemini モデルはネイティブなビジョンで PDF を処理するため、ドキュメント内のテキストと画像の両方の内容を理解できます。ネイティブ PDF ビジョンをサポートしているため、Gemini モデルは次のことができます。

ドキュメント内の図、グラフ、表を分析します。
情報を構造化された出力形式に抽出します。
ドキュメント内の画像とテキストの内容に関する質問に回答します。
ドキュメントを要約する。
ドキュメントのコンテンツを（HTML などに変換して）音声文字変換し、レイアウトとフォーマットを保持して、ダウンストリームアプリケーション（RAG パイプラインなど）で使用できるようにします。

このチュートリアルでは、PDF ドキュメントで Gemini API を使用する方法をいくつか紹介します。出力はすべてテキストのみです。

始める前に: プロジェクトと API キーを設定する

Gemini API を呼び出す前に、プロジェクトを設定し、API キーを構成する必要があります。

プロジェクトと API キーを設定する方法を表示するには、展開してください

API キーを取得して保護する

Gemini API を呼び出すには、API キーが必要です。キーがない場合は、Google AI Studio でキーを作成します。

API キーを取得する

API キーをバージョン管理システムにチェックインしないことを強くおすすめします。

API キーは、Google Cloud Secret Manager などのシークレットストアに保存する必要があります。

このチュートリアルでは、API キーに環境変数としてアクセスすることを前提としています。

SDK パッケージをインストールして API キーを構成する

Gemini API 用の Python SDK は google-genai パッケージに含まれています。

pip を使用して依存関係をインストールします。
```
pip install -U google-genai
```
API キーを GOOGLE_API_KEY 環境変数に格納します。
```
export GOOGLE_API_KEY="YOUR_KEY_HERE"
```
API Client を作成すると、環境からキーが取得されます。
```
from google import genai

client = genai.Client()
```

PDF を使用したプロンプト

このガイドでは、File API を使用して PDF をアップロードして処理する方法、または PDF をインラインデータとして含める方法について説明します。

詳細な技術情報

Gemini 1.5 Pro と 1.5 Flash は、最大 3,600 ページのドキュメントをサポートしています。ドキュメントページは、次のいずれかのテキストデータ MIME タイプである必要があります。

PDF - application/pdf
JavaScript - application/x-javascript、text/javascript
Python - application/x-python、text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
Markdown - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

各ドキュメントページは 258 個のトークンに相当します。

モデルのコンテキストウィンドウ以外に、ドキュメント内のピクセル数に具体的な制限はありませんが、大きなページは元のアスペクト比を維持したまま最大解像度 3, 072x3, 072 に縮小され、小さいページは 768x768 ピクセルに拡大されます。サイズが小さいページでは、帯域幅を除き、費用は削減されません。また、解像度が高いページのパフォーマンスも向上しません。

最良の結果を得るために、次のことを行います。

アップロードする前に、ページを適切な向きに回転してください。
ぼやけたページは避けてください。
1 つのページを使用する場合は、ページの後にテキストプロンプトを配置します。

PDF 入力

PDF ペイロードが 20 MB 未満の場合は、base64 エンコードされたドキュメントをアップロードするか、ローカルに保存されているファイルを直接アップロードするかを選択できます。

インラインデータとして

PDF ドキュメントは URL から直接処理できます。その方法を示すコードスニペットは次のとおりです。

from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

ローカルに保存されている PDF

ローカルに保存されている PDF の場合は、次の方法を使用できます。

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
filepath = pathlib.Path('file.pdf')
filepath.write_bytes(httpx.get(doc_url).content)

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

サイズの大きい PDF

File API を使用すると、任意のサイズのドキュメントをアップロードできます。リクエストの合計サイズ（ファイル、テキストプロンプト、システムインストラクションなど）が 20 MB を超える場合は、常に File API を使用してください。

media.upload を呼び出して、File API を使用してファイルをアップロードします。次のコードは、ドキュメントファイルをアップロードし、models.generateContent の呼び出しでそのファイルを使用します。

URL からの大容量の PDF

URL から取得できる大規模な PDF ファイルには File API を使用して、URL から直接これらのドキュメントをアップロードして処理するプロセスを簡素化します。

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  path=doc_io, 
  config=dict(
    # It will guess the mime type from the file extension, but if you pass
    # a file-like object, you need to set the
    mime_type='application/pdf')
)

prompt = "Summarize this document"


response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_doc, prompt])
print(response.text)

ローカルに保存されている大容量の PDF

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve the PDF
file_path = pathlib.Path('A17.pdf')
file_path.write_bytes(httpx.get(long_context_pdf_path).content)

# Upload the PDF using the File API
sample_file = client.files.upload(
  path=file_path,
)

prompt="Summarize this document"

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_file, "Summarize this document"])
print(response.text)

API がアップロードされたファイルを正常に保存したことを確認するには、files.get を呼び出してメタデータを取得します。name（および拡張として uri）のみが一意です。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

file_info = client.files.get(file.name)
print(file_info.model_dump_json(indent=4))

複数の PDF

Gemini API は、ドキュメントとテキストプロンプトの合計サイズがモデルのコンテキストウィンドウ内に収まる限り、1 つのリクエストで複数の PDF ドキュメントを処理できます。

from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

ファイルの一覧表示

File API を使用してアップロードされたすべてのファイルとその URI を一覧表示するには、files.list を使用します。

from google import genai

client = genai.Client()

print("My files:")
for f in client.files.list():
    print("  ", f.name)

ファイルを削除

File API を使用してアップロードされたファイルは、2 日後に自動的に削除されます。files.delete を使用して手動で削除することもできます。

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

client.files.delete(file.name)

PDF を使用したコンテキストキャッシュ保存

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

document = client.files.upload(
  path=doc_io,
  config=dict(mime_type='application/pdf')
)

# Specify the model name and system instruction for caching
model_name = "gemini-1.5-flash-002" # Ensure this matches the model you intend to use
system_instruction = "You are an expert analyzing transcripts."

# Create a cached content object
cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      system_instruction=system_instruction,
      contents=[document], # The document(s) and other content you wish to cache
    )
)

# Display the cache details
print(f'{cache=}')

# Generate content using the cached prompt and document
response = client.models.generate_content(
  model=model_name,
  contents="Please summarize this transcript",
  config=types.GenerateContentConfig(
    cached_content=cache.name
  ))

# (Optional) Print usage metadata for insights into the API call
print(f'{response.usage_metadata=}')

# Print the generated text
print('\n\n', response.text)

キャッシュを一覧表示する

キャッシュに保存されたコンテンツを取得または表示することはできませんが、キャッシュメタデータ（name、model、display_name、usage_metadata、create_time、update_time、expire_time）を取得できます。

アップロードされたすべてのキャッシュのメタデータを一覧表示するには、CachedContent.list() を使用します。

from google import genai

client = genai.Client()
for c in client.caches.list():
  print(c)

キャッシュを更新する

キャッシュに新しい ttl または expire_time を設定できます。キャッシュのその他の変更はサポートされていません。

次の例は、CachedContent.update() を使用してキャッシュの ttl を更新する方法を示しています。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.update(
  name = cache.name,
  config=types.UpdateCachedContentConfig(
    ttl=f'{datetime.timedelta(hours=2).total_seconds()}s'
  )
)

キャッシュを削除する

キャッシュサービスには、キャッシュからコンテンツを手動で削除するための削除オペレーションが用意されています。次の例は、CachedContent.delete() を使用してキャッシュを削除する方法を示しています。

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.delete(name = cache.name)

次のステップ

このガイドでは、generateContent を使用して、処理されたドキュメントからテキスト出力を生成する方法について説明します。詳細については、次のリソースをご覧ください。

ファイルプロンプト戦略: Gemini API は、テキスト、画像、音声、動画データによるプロンプト（マルチモーダルプロンプト）をサポートしています。
システム指示: システム指示を使用すると、特定のニーズやユースケースに基づいてモデルの動作を制御できます。
安全性に関するガイダンス: 生成 AI モデルは、不正確な出力、偏見のある出力、不適切な出力など、予期しない出力を生成することがあります。このような出力による被害のリスクを軽減するには、後処理と人間による評価が不可欠です。