이제 Gemini 2.0 Flash를 프로덕션에 사용할 수 있습니다. 자세히 알아보기

이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Gemini API로 문서 처리 기능 살펴보기

Gemini API는 긴 문서 (최대 3, 600페이지)를 포함한 PDF 입력을 지원합니다. Gemini 모델은 기본 비전으로 PDF를 처리하므로 문서 내 텍스트와 이미지 콘텐츠를 모두 이해할 수 있습니다. 네이티브 PDF 비전 지원을 통해 Gemini 모델은 다음을 실행할 수 있습니다.

문서 내의 다이어그램, 차트, 표를 분석합니다.
구조화된 출력 형식으로 정보를 추출합니다.
문서의 시각적 콘텐츠 및 텍스트 콘텐츠에 관한 질문에 답변합니다.
문서 요약
다운스트림 애플리케이션 (예: RAG 파이프라인)에서 사용할 수 있도록 레이아웃과 서식을 보존하면서 문서 콘텐츠를 스크립트로 변환합니다 (예: HTML로).

이 튜토리얼에서는 PDF 문서에서 Gemini API를 사용하는 몇 가지 방법을 보여줍니다. 모든 출력은 텍스트로만 표시됩니다.

시작하기 전에: 프로젝트 및 API 키 설정

Gemini API를 호출하기 전에 프로젝트를 설정하고 API 키를 구성해야 합니다.

펼쳐 프로젝트 및 API 키를 설정하는 방법 보기

API 키 가져오기 및 보호

Gemini API를 호출하려면 API 키가 필요합니다. 아직 키가 없다면 Google AI 스튜디오에서 키를 만드세요.

API 키 가져오기

버전 제어 시스템에 API 키를 체크인하지 마세요.

Google Cloud Secret Manager와 같은 보안 비밀 저장소에 API 키를 저장해야 합니다.

이 튜토리얼에서는 API 키에 환경 변수로 액세스한다고 가정합니다.

SDK 패키지 설치 및 API 키 구성

Gemini API용 Python SDK는 google-genai 패키지에 포함되어 있습니다.

pip를 사용하여 종속 항목을 설치합니다.
```
pip install -U google-genai
```
GOOGLE_API_KEY 환경 변수에 API 키를 넣습니다.
```
export GOOGLE_API_KEY="YOUR_KEY_HERE"
```
API Client를 만듭니다. 그러면 환경에서 키가 선택됩니다.
```
from google import genai

client = genai.Client()
```

PDF로 프롬프트

이 가이드에서는 File API를 사용하거나 PDF를 인라인 데이터로 포함하여 PDF를 업로드하고 처리하는 방법을 보여줍니다.

기술 세부정보

Gemini 1.5 Pro 및 1.5 Flash는 최대 3,600개의 문서 페이지를 지원합니다. 문서 페이지는 다음 텍스트 데이터 MIME 유형 중 하나여야 합니다.

PDF - application/pdf
JavaScript - application/x-javascript, text/javascript
Python - application/x-python, text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
마크다운 - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

각 문서 페이지는 258개의 토큰에 해당합니다.

문서의 픽셀 수에는 모델의 컨텍스트 창 외에도 특별한 제한이 없지만, 큰 페이지는 원래 가로세로 비율을 유지하면서 최대 해상도인 3072x3072로 축소되고, 작은 페이지는 768x768픽셀로 확대됩니다. 크기가 작은 페이지의 경우 대역폭을 제외하고 비용이 절감되지 않으며, 해상도가 높은 페이지의 경우 성능이 개선되지 않습니다.

최상의 결과를 얻는 방법

업로드하기 전에 페이지를 올바른 방향으로 회전하세요.
흐릿한 페이지는 피하세요.
단일 페이지를 사용하는 경우 텍스트 프롬프트를 페이지 뒤에 배치합니다.

PDF 입력

20MB 미만의 PDF 페이로드의 경우 base64 인코딩된 문서를 업로드하거나 로컬에 저장된 파일을 직접 업로드할 수 있습니다.

인라인 데이터로

URL에서 직접 PDF 문서를 처리할 수 있습니다. 다음은 이를 처리하는 방법을 보여주는 코드 스니펫입니다.

from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

로컬에 저장된 PDF

로컬에 저장된 PDF의 경우 다음 접근 방식을 사용할 수 있습니다.

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
filepath = pathlib.Path('file.pdf')
filepath.write_bytes(httpx.get(doc_url).content)

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

대용량 PDF

File API를 사용하여 크기에 관계없이 문서를 업로드할 수 있습니다. 총 요청 크기 (파일, 텍스트 프롬프트, 시스템 안내 등 포함)가 20MB를 초과하는 경우 항상 File API를 사용하세요.

media.upload를 호출하여 File API를 사용하여 파일을 업로드합니다. 다음 코드는 문서 파일을 업로드한 다음 models.generateContent 호출에서 이 파일을 사용합니다.

URL의 대용량 PDF

URL에서 제공되는 대용량 PDF 파일에 File API를 사용하여 URL을 통해 이러한 문서를 직접 업로드하고 처리하는 프로세스를 간소화합니다.

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

sample_doc = client.files.upload(
  # You can pass a path or a file-like object here
  path=doc_io, 
  config=dict(
    # It will guess the mime type from the file extension, but if you pass
    # a file-like object, you need to set the
    mime_type='application/pdf')
)

prompt = "Summarize this document"


response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_doc, prompt])
print(response.text)

로컬에 저장된 대용량 PDF

from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve the PDF
file_path = pathlib.Path('A17.pdf')
file_path.write_bytes(httpx.get(long_context_pdf_path).content)

# Upload the PDF using the File API
sample_file = client.files.upload(
  path=file_path,
)

prompt="Summarize this document"

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_file, "Summarize this document"])
print(response.text)

API가 업로드된 파일을 성공적으로 저장했는지 확인하고 files.get를 호출하여 메타데이터를 가져올 수 있습니다. name(및 확장적으로 uri)만 고유합니다.

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

file_info = client.files.get(file.name)
print(file_info.model_dump_json(indent=4))

여러 PDF

Gemini API는 문서와 텍스트 프롬프트의 합산 크기가 모델의 컨텍스트 창 내에 있는 한 단일 요청으로 여러 PDF 문서를 처리할 수 있습니다.

from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

파일 나열

files.list를 사용하여 File API를 사용하여 업로드된 모든 파일과 해당 URI를 나열할 수 있습니다.

from google import genai

client = genai.Client()

print("My files:")
for f in client.files.list():
    print("  ", f.name)

파일 삭제

File API를 사용하여 업로드된 파일은 2일 후에 자동으로 삭제됩니다. files.delete를 사용하여 수동으로 삭제할 수도 있습니다.

from google import genai
import pathlib

client = genai.Client()

fpath = pathlib.Path('example.txt')
fpath.write_text('hello')

file = client.files.upload('example.txt')

client.files.delete(file.name)

PDF를 사용한 컨텍스트 캐싱

from google import genai
from google.genai import types
import io
import httpx

client = genai.Client()

long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)

document = client.files.upload(
  path=doc_io,
  config=dict(mime_type='application/pdf')
)

# Specify the model name and system instruction for caching
model_name = "gemini-1.5-flash-002" # Ensure this matches the model you intend to use
system_instruction = "You are an expert analyzing transcripts."

# Create a cached content object
cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      system_instruction=system_instruction,
      contents=[document], # The document(s) and other content you wish to cache
    )
)

# Display the cache details
print(f'{cache=}')

# Generate content using the cached prompt and document
response = client.models.generate_content(
  model=model_name,
  contents="Please summarize this transcript",
  config=types.GenerateContentConfig(
    cached_content=cache.name
  ))

# (Optional) Print usage metadata for insights into the API call
print(f'{response.usage_metadata=}')

# Print the generated text
print('\n\n', response.text)

캐시 목록

캐시된 콘텐츠는 검색하거나 볼 수 없지만 캐시 메타데이터 (name, model, display_name, usage_metadata, create_time, update_time, expire_time)는 검색할 수 있습니다.

업로드된 모든 캐시의 메타데이터를 나열하려면 CachedContent.list()를 사용합니다.

from google import genai

client = genai.Client()
for c in client.caches.list():
  print(c)

캐시 업데이트

캐시의 새 ttl 또는 expire_time를 설정할 수 있습니다. 캐시의 다른 사항은 변경할 수 없습니다.

다음 예는 CachedContent.update()를 사용하여 캐시의 ttl를 업데이트하는 방법을 보여줍니다.

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.update(
  name = cache.name,
  config=types.UpdateCachedContentConfig(
    ttl=f'{datetime.timedelta(hours=2).total_seconds()}s'
  )
)

캐시 삭제

캐싱 서비스는 캐시에서 콘텐츠를 수동으로 삭제하기 위한 삭제 작업을 제공합니다. 다음 예는 CachedContent.delete()를 사용하여 캐시를 삭제하는 방법을 보여줍니다.

from google import genai
from google.genai import types
import datetime

client = genai.Client()

model_name = "models/gemini-1.5-flash-002" 

cache = client.caches.create(
    model=model_name,
    config=types.CreateCachedContentConfig(
      contents=['hello']
    )
)

client.caches.delete(name = cache.name)

다음 단계

이 가이드에서는 generateContent를 사용하고 처리된 문서에서 텍스트 출력을 생성하는 방법을 보여줍니다. 자세한 내용은 다음 리소스를 참고하세요.

파일 프롬프트 전략: Gemini API는 텍스트, 이미지, 오디오, 동영상 데이터를 사용한 프롬프트를 지원합니다. 이를 멀티모달 프롬프트라고도 합니다.
시스템 안내: 시스템 안내를 사용하면 특정 요구사항 및 사용 사례에 따라 모델의 동작을 조정할 수 있습니다.
안전 가이드: 생성형 AI 모델이 부정확하거나 편향되거나 불쾌감을 주는 출력과 같은 예상치 못한 출력을 생성하는 경우가 있습니다. 이러한 출력으로 인한 피해 위험을 제한하려면 후처리 및 사람의 평가가 필수적입니다.