Gemini 2.0 Flash เวอร์ชันทดลองพร้อมให้ใช้งานแล้ว ดูข้อมูลเพิ่มเติม

หน้านี้ได้รับการแปลโดย Cloud Translation API

สำรวจความสามารถในการประมวลผลเอกสารด้วย Gemini API

Gemini API รองรับอินพุต PDF รวมถึงเอกสารที่มีความยาว (สูงสุด 3, 600 หน้า) โมเดล Gemini จะประมวลผล PDF ด้วยระบบการมองเห็นแบบเนทีฟ จึงเข้าใจทั้งเนื้อหาข้อความและรูปภาพในเอกสาร เมื่อรองรับการมองเห็น PDF โดยตรง โมเดล Gemini จะทําสิ่งต่อไปนี้ได้

วิเคราะห์แผนภาพ แผนภูมิ และตารางในเอกสาร
ดึงข้อมูลในรูปแบบเอาต์พุตที่มีโครงสร้าง
ตอบคําถามเกี่ยวกับเนื้อหาที่เป็นภาพและข้อความในเอกสาร
สรุปเอกสาร
ถอดเสียงเนื้อหาเอกสาร (เช่น เป็น HTML) โดยคงเลย์เอาต์และการจัดรูปแบบไว้เพื่อใช้ในแอปพลิเคชันดาวน์สตรีม (เช่น ในไปป์ไลน์ RAG)

บทแนะนำนี้จะสาธิตวิธีใช้ Gemini API กับเอกสาร PDF เอาต์พุตทั้งหมดเป็นแบบข้อความเท่านั้น

ก่อนเริ่มต้น: ตั้งค่าโปรเจ็กต์และคีย์ API

คุณต้องตั้งค่าโปรเจ็กต์และกำหนดค่าคีย์ API ก่อนเรียกใช้ Gemini API

ขยายเพื่อดูวิธีตั้งค่าโปรเจ็กต์และคีย์ API

รับและรักษาคีย์ API ให้ปลอดภัย

คุณต้องมีคีย์ API เพื่อเรียกใช้ Gemini API หากยังไม่มี ให้สร้างคีย์ใน Google AI Studio

รับคีย์ API

เราขอแนะนำอย่างยิ่งให้คุณไม่ตรวจสอบคีย์ API ในระบบควบคุมเวอร์ชัน

คุณควรจัดเก็บคีย์ API ในที่เก็บข้อมูลลับ เช่น Secret Manager ของ Google Cloud

บทแนะนํานี้จะถือว่าคุณกําลังเข้าถึงคีย์ API เป็นตัวแปรสภาพแวดล้อม

ติดตั้งแพ็กเกจ SDK และกำหนดค่าคีย์ API

Python SDK สําหรับ Gemini API จะอยู่ในแพ็กเกจ google-generativeai

ติดตั้งการอ้างอิงโดยใช้ pip
```
pip install -U google-generativeai
```
นําเข้าแพ็กเกจและกําหนดค่าบริการด้วยคีย์ API ดังนี้
```
import os
import google.generativeai as genai

genai.configure(api_key=os.environ['API_KEY'])
```

พรอมต์ด้วย PDF

คู่มือนี้แสดงวิธีอัปโหลดและประมวลผลไฟล์ PDF โดยใช้ File API หรือโดยการรวมไฟล์เป็นข้อมูลในบรรทัด

รายละเอียดทางเทคนิค

Gemini 1.5 Pro และ 1.5 Flash รองรับเอกสารได้สูงสุด 3,600 หน้า หน้าเอกสารต้องอยู่ในรูปแบบ MIME ของข้อมูลข้อความประเภทใดประเภทหนึ่งต่อไปนี้

PDF - application/pdf
JavaScript - application/x-javascript, text/javascript
Python - application/x-python, text/x-python
TXT - text/plain
HTML - text/html
CSS - text/css
มาร์กดาวน์ - text/md
CSV - text/csv
XML - text/xml
RTF - text/rtf

แต่ละหน้าเอกสารจะเท่ากับโทเค็น 258 รายการ

แม้ว่าจะไม่มีขีดจำกัดที่เจาะจงสำหรับจำนวนพิกเซลในเอกสารนอกเหนือจากกรอบบริบทของโมเดล แต่ระบบจะปรับขนาดหน้าเว็บขนาดใหญ่ให้ความละเอียดสูงสุด 3072x3072 พิกเซลโดยคงอัตราส่วนภาพเดิมไว้ ส่วนหน้าเว็บขนาดเล็กจะปรับขนาดให้ใหญ่ขึ้นเป็น 768x768 พิกเซล หน้าเว็บที่มีความละเอียดต่ำจะไม่มีค่าใช้จ่ายลดลง นอกเหนือจากแบนด์วิดท์ หรือประสิทธิภาพที่ดีขึ้นสำหรับหน้าเว็บที่มีความละเอียดสูง

เพื่อผลลัพธ์ที่ดีที่สุด ให้ทำดังนี้

หมุนหน้าเว็บให้อยู่ในแนวที่ถูกต้องก่อนอัปโหลด
หลีกเลี่ยงหน้าเว็บที่เบลอ
หากใช้หน้าเดียว ให้วางพรอมต์ข้อความไว้หลังหน้านั้น

อินพุต PDF

สำหรับเพย์โหลด PDF น้อยกว่า 20 MB คุณสามารถเลือกระหว่างการอัปโหลดเอกสารที่เข้ารหัส Base64 หรือการอัปโหลดไฟล์ที่เก็บไว้ในเครื่องโดยตรง

เอกสารที่เข้ารหัส Base64

คุณประมวลผลเอกสาร PDF ได้โดยตรงจาก URL ต่อไปนี้คือตัวอย่างโค้ดที่แสดงวิธีดำเนินการ

import httpx
import base64

model = genai.GenerativeModel("gemini-1.5-flash")
doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF
doc_data = base64.standard_b64encode(httpx.get(doc_url).content).decode("utf-8")

prompt = "Summarize this document"

response = model.generate_content([{'mime_type':'application/pdf', 'data': doc_data}, prompt])
print(response.text)

ไฟล์ PDF ที่เก็บไว้ในเครื่อง

สำหรับ PDF ที่เก็บไว้ในเครื่อง ให้ใช้แนวทางต่อไปนี้

import base64

model = genai.GenerativeModel("gemini-1.5-flash")
doc_path = "/path/to/file.pdf" # Replace with the actual path to your local PDF

# Read and encode the local file
with open(doc_path, "rb") as doc_file:
    doc_data = base64.standard_b64encode(doc_file.read()).decode("utf-8")

prompt = "Summarize this document"

response = model.generate_content([{'mime_type': 'application/pdf', 'data': doc_data}, prompt])

print(response.text)

PDF ขนาดใหญ่

คุณสามารถใช้ File API เพื่ออัปโหลดเอกสารได้ทุกขนาด ใช้ File API เสมอเมื่อคำขอทั้งหมด (รวมถึงไฟล์ ข้อความแจ้ง ระบบ วิธีการ ฯลฯ) มีขนาดใหญ่กว่า 20 MB

หมายเหตุ: File API ช่วยให้คุณจัดเก็บไฟล์ได้สูงสุด 20 GB ต่อโปรเจ็กต์ โดยแต่ละไฟล์มีขนาดสูงสุด 2 GB ระบบจะจัดเก็บไฟล์ไว้เป็นเวลา 48 ชั่วโมง คุณจะเข้าถึงข้อมูลดังกล่าวได้ในช่วงดังกล่าวด้วยคีย์ API แต่ไม่สามารถดาวน์โหลดจาก API ได้ File API พร้อมให้ใช้งานโดยไม่มีค่าใช้จ่ายในทุกภูมิภาคที่ Gemini API พร้อมให้บริการ

โทรหา media.upload เพื่ออัปโหลดไฟล์โดยใช้ File API โค้ดต่อไปนี้จะอัปโหลดไฟล์เอกสาร จากนั้นใช้ไฟล์ในการเรียกใช้ models.generateContent

PDF ขนาดใหญ่จาก URL (:#large-pdfs-urls)

ใช้ File API สำหรับไฟล์ PDF ขนาดใหญ่ที่พร้อมใช้งานจาก URL ซึ่งจะลดความซับซ้อนของกระบวนการอัปโหลดและประมวลผลเอกสารเหล่านี้โดยตรงผ่าน URL

import io
import httpx

model = genai.GenerativeModel("gemini-1.5-flash")
long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the actual URL of your large PDF

# Retrieve and upload the PDF using the File API
doc_data = io.BytesIO(httpx.get(long_context_pdf_path).content)
sample_doc = genai.upload_file(data=doc_data, mime_type='application/pdf')

prompt = "Summarize this document"

response = model.generate_content([sample_doc, prompt])
print(response.text)

PDF ขนาดใหญ่ที่จัดเก็บไว้ในเครื่อง (:#large-pdfs-local)

import google.generativeai as genai

model = genai.GenerativeModel("gemini-1.5-flash")
sample_pdf = genai.upload_file(media / "test.pdf")
response = model.generate_content(["Give me a summary of this pdf file.", sample_pdf])
print(response.text)files.py

คุณสามารถตรวจสอบว่า API จัดเก็บไฟล์ที่อัปโหลดไว้เรียบร้อยแล้วและรับข้อมูลเมตาของไฟล์ดังกล่าวได้โดยเรียกใช้ files.get มีเพียง name (และ uri) เท่านั้นที่ไม่ซ้ำกัน

import google.generativeai as genai

myfile = genai.upload_file(media / "poem.txt")
file_name = myfile.name
print(file_name)  # "files/*"

myfile = genai.get_file(file_name)
print(myfile)files.py

PDF หลายไฟล์

Gemini API สามารถประมวลผลเอกสาร PDF หลายรายการในคำขอเดียว ตราบใดที่ขนาดรวมของเอกสารและพรอมต์ข้อความอยู่ภายในหน้าต่างบริบทของโมเดล

import io
import httpx

model = genai.GenerativeModel("gemini-1.5-flash")

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = genai.upload_file(data=doc_data_1, mime_type='application/pdf')
sample_pdf_2 = genai.upload_file(data=doc_data_2, mime_type='application/pdf')

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = model.generate_content([sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

แสดงรายการไฟล์

คุณสามารถแสดงรายการไฟล์ทั้งหมดที่อัปโหลดโดยใช้ File API และ URI ของไฟล์ได้โดยใช้ files.list

import google.generativeai as genai

print("My files:")
for f in genai.list_files():
    print("  ", f.name)files.py

ลบไฟล์

ระบบจะลบไฟล์ที่อัปโหลดโดยใช้ File API โดยอัตโนมัติหลังจากผ่านไป 2 วัน นอกจากนี้ คุณยังลบด้วยตนเองได้โดยใช้ files.delete

import google.generativeai as genai

myfile = genai.upload_file(media / "poem.txt")

myfile.delete()

try:
    # Error.
    model = genai.GenerativeModel("gemini-1.5-flash")
    result = model.generate_content([myfile, "Describe this file."])
except google.api_core.exceptions.PermissionDenied:
    passfiles.py

การแคชบริบทด้วย PDF

import os
from google.generativeai import caching
import io
import httpx

# Define the path to the PDF document (or use a URL)
long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Replace with the URL of your large PDF
doc_data = io.BytesIO(httpx.get(long_context_pdf_path).content)

# Upload the PDF document using the File API
document = genai.upload_file(data=doc_data, mime_type='application/pdf')

# Specify the model name and system instruction for caching
model_name = "gemini-1.5-flash-002" # Ensure this matches the model you intend to use
system_instruction = "You are an expert analyzing transcripts."

# Create a cached content object
cache = caching.CachedContent.create(
    model=model_name,
    system_instruction=system_instruction,
    contents=[document], # The document(s) and other content you wish to cache
)

# Display the cache details
print(cache)

# Initialize a generative model from the cached content
model = genai.GenerativeModel.from_cached_content(cache)

# Generate content using the cached prompt and document
response = model.generate_content("Please summarize this transcript")

# (Optional) Print usage metadata for insights into the API call
print(response.usage_metadata)

# Print the generated text
print(response.text)

แสดงรายการแคช

คุณไม่สามารถเรียกดูหรือดูเนื้อหาที่แคชไว้ได้ แต่สามารถเรียกข้อมูลเมตาที่แคชไว้ (name, model, display_name, usage_metadata, create_time, update_time และ expire_time)

หากต้องการแสดงรายการข้อมูลเมตาของแคชที่อัปโหลดทั้งหมด ให้ใช้ CachedContent.list()

for c in caching.CachedContent.list():
  print(c)

อัปเดตแคช

คุณสามารถตั้งค่า ttl หรือ expire_time ใหม่สำหรับแคชได้ ไม่รองรับการเปลี่ยนแปลงอื่นๆ เกี่ยวกับแคช

ตัวอย่างต่อไปนี้แสดงวิธีอัปเดต ttl ของแคชโดยใช้ CachedContent.update()

import datetime

cache.update(ttl=datetime.timedelta(hours=2))

ลบแคช

บริการแคชมีการดำเนินการลบสําหรับนําเนื้อหาออกจากแคชด้วยตนเอง ตัวอย่างต่อไปนี้แสดงวิธีลบแคชโดยใช้ CachedContent.delete()

cache.delete()

ขั้นตอนถัดไป

คู่มือนี้แสดงวิธีใช้ generateContent และวิธีสร้างเอาต์พุตข้อความจากเอกสารที่ประมวลผลแล้ว ดูข้อมูลเพิ่มเติมได้ที่แหล่งข้อมูลต่อไปนี้

กลยุทธ์การแจ้งไฟล์: Gemini API รองรับการแจ้งด้วยข้อมูลข้อความ รูปภาพ เสียง และวิดีโอ หรือที่เรียกว่าการแจ้งแบบหลายสื่อ
คำสั่งของระบบ: คำสั่งของระบบช่วยให้คุณควบคุมลักษณะการทํางานของโมเดลตามความต้องการและกรณีการใช้งานที่เฉพาะเจาะจง
คำแนะนำด้านความปลอดภัย: บางครั้งโมเดล Generative AI จะสร้างเอาต์พุตที่ไม่คาดคิด เช่น เอาต์พุตที่ไม่ถูกต้อง มีอคติ หรือไม่เหมาะสม ขั้นตอนหลังการประมวลผลและการประเมินจากเจ้าหน้าที่เป็นสิ่งจําเป็นในการจำกัดความเสี่ยงของอันตรายจากเอาต์พุตดังกล่าว