Gemma 4 เปิดตัวพร้อมอินพุตข้อความ เสียง และรูปภาพ รวมถึงหน้าต่างบริบทแบบยาวที่มีโทเค็นให้ถึง 2.56 แสนโทเค็น ดูข้อมูลเพิ่มเติม

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

เรียกใช้ Gemma กับ Hugging Face Transformers

ดูใน ai.google.dev

เรียกใช้ใน Google Colab

เรียกใช้ใน Kaggle

เปิดใน Vertex AI

ดูซอร์สโค้ดใน GitHub

การสร้างข้อความ การสรุป และการวิเคราะห์เนื้อหาเป็นเพียงบางส่วนของงานที่คุณทำได้ด้วยโมเดลแบบเปิดของ Gemma บทแนะนำนี้จะแสดงวิธีเริ่มต้นใช้งาน Gemma โดยใช้ Hugging Face Transformers ทั้งอินพุตข้อความและรูปภาพเพื่อสร้างเนื้อหาข้อความ ไลบรารี Transformers Python มี API สำหรับเข้าถึงโมเดล Generative AI ที่ผ่านการฝึกมาล่วงหน้า ซึ่งรวมถึง Gemma ดูข้อมูลเพิ่มเติมได้ในเอกสารประกอบของ Transformers

ติดตั้งแพ็กเกจ Python

ติดตั้งไลบรารี Hugging Face ที่จำเป็นสำหรับการเรียกใช้โมเดล Gemma และส่งคำขอ

# Install Pytorch
%pip install torch

# Install a transformers
%pip install transformers

สร้างข้อความจากข้อความ

การป้อนข้อความพรอมต์ให้กับโมเดล Gemma เพื่อรับการตอบกลับเป็นข้อความเป็นวิธีที่ง่ายที่สุดในการใช้ Gemma และใช้ได้กับ Gemma เกือบทุกเวอร์ชัน ส่วนนี้แสดงวิธีใช้ไลบรารี Hugging Face Transformers เพื่อโหลดและกำหนดค่าโมเดล Gemma สำหรับการสร้างข้อความจากข้อความ

โหลดโมเดล

ใช้ไลบรารี torch และ transformers เพื่อสร้างอินสแตนซ์ของคลาส pipeline การดำเนินการโมเดลด้วย Gemma เมื่อใช้โมเดลเพื่อสร้างเอาต์พุตหรือทำตามคำแนะนำ ให้เลือกโมเดลที่ปรับแต่งคำแนะนำ (IT) ซึ่งโดยปกติจะมี it ในสตริงรหัสโมเดล เมื่อใช้ออบเจ็กต์ pipeline คุณจะระบุเวอร์ชัน Gemma ที่ต้องการใช้ ประเภทงานที่ต้องการทำ ซึ่งโดยเฉพาะอย่างยิ่ง "any-to-any" สำหรับการสร้างแบบมัลติโมดัล ดังที่แสดงในตัวอย่างโค้ดต่อไปนี้

from transformers import pipeline

MODEL_ID = "google/gemma-4-E2B-it"

pipe = pipeline(
    task="any-to-any",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

config.json: 0.00B [00:00, ?B/s]
model.safetensors:   0%|          | 0.00/10.2G [00:00<?, ?B/s]
Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/208 [00:00<?, ?B/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json:   0%|          | 0.00/32.2M [00:00<?, ?B/s]

Gemma รองรับการตั้งค่า task เพียงไม่กี่รายการสำหรับการสร้าง ดูข้อมูลเพิ่มเติมเกี่ยวกับการตั้งค่า task ที่มีได้ที่เอกสารประกอบของ Hugging Face Pipelines task() ดูข้อมูลเพิ่มเติมเกี่ยวกับการใช้คลาส Pipeline ได้ที่เอกสารประกอบของ Hugging Face Pipelines

เรียกใช้การสร้างข้อความ

เมื่อโหลดและกำหนดค่าโมเดล Gemma ในออบเจ็กต์ pipeline แล้ว คุณจะส่งพรอมต์ไปยังโมเดลได้ โค้ดตัวอย่างต่อไปนี้แสดงคำขอพื้นฐานโดยใช้พารามิเตอร์ text

pipe(text="<|turn>user\nroses are red<turn|>\n<|turn>model\n")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[{'input_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\n',
  'generated_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\nThat\'s a classic phrase, often used to highlight a contrast or a truth.\n\n**"Roses are red"** is a very popular, simple, and sweet arrangement.\n\nWhat would you like to do with this phrase? Are you looking for:\n\n1. **More rhymes or phrases?**\n2. **A continuation of a thought?**\n3. **Just appreciating the simplicity?**'}]

ใช้เทมเพลตพรอมต์

เมื่อสร้างเนื้อหาด้วยการป้อนพรอมต์ที่ซับซ้อนมากขึ้น ให้ใช้เทมเพลตพรอมต์เพื่อจัดโครงสร้างคำขอ เทมเพลตพรอมต์ช่วยให้คุณระบุอินพุตจากบทบาทที่เฉพาะเจาะจง เช่น user หรือ model และเป็นรูปแบบที่จำเป็นสำหรับการจัดการการโต้ตอบการสนทนาหลายรอบกับโมเดล Gemma โค้ดตัวอย่างต่อไปนี้แสดงวิธีสร้างเทมเพลตพรอมต์สำหรับ Gemma

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "Roses are red..."}]
    },
]

pipe(messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'system',
    'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]},
   {'role': 'user',
    'content': [{'type': 'text', 'text': 'Roses are red...'}]}],
  'generated_text': 'Roses are red,\nViolets are blue,\nHow lovely to see\nA beautiful view.'}]

สร้างข้อความจากข้อมูลรูปภาพ

ตั้งแต่ Gemma 3 เป็นต้นไป สำหรับโมเดลขนาด 4B ขึ้นไป คุณสามารถใช้ข้อมูลรูปภาพเป็นส่วนหนึ่งของพรอมต์ได้ ส่วนนี้แสดงวิธีใช้ไลบรารี Transformers เพื่อโหลดและกำหนดค่าโมเดล Gemma ให้ใช้ข้อมูลรูปภาพและอินพุตข้อความเพื่อสร้างเอาต์พุตข้อความ

ใช้เทมเพลตพรอมต์

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg'},
     {'type': 'text', 'text': 'What is shown in this image?'}]},
   {'role': 'assistant',
    'content': [{'type': 'text', 'text': 'This image shows'}]}],
  'generated_text': " a platter of Indian food, likely a meal or an assortment of dishes.\n\nHere's a breakdown of what is visible:\n\n*   **Flatbread:** There is a large, golden-brown flatbread (possibly naan or roti) dominating the center of the platter.\n*   **Dips/Sides:** There are several small bowls containing various accompaniments:\n    *   A bowl of **yellow/mustard-colored dip** (perhaps a chutney or sauce).\n    *   A bowl of **white creamy dip** (like raita or yogurt sauce).\n    *   A portion of **white rice**.\n    *   Several bowls of **curries or sauces** in different colors:\n        *   An **orange/brown curry**.\n        *   A **deep yellow/orange sauce**.\n        *   A **green sauce** (likely a chutney).\n*   **Garnish/Side Item:** In the upper right corner, there appears to be some darker, textured items, possibly fried pieces or spices.\n*   **Platter:** The food is served on a metal platter.\n\nOverall, it looks like a traditional Indian meal setup featuring bread, rice, and various flavorful sauces/curries."}]

คุณสามารถใส่รูปภาพหลายรูปในพรอมต์ได้โดยใส่รายการ "type": "image", เพิ่มเติมในรายการ content

หมายเหตุ: อย่าใช้ <|image|>, <start_of_image> หรือ <image_soft_token> ในส่วนข้อความของเทมเพลตพรอมต์ เนื่องจากวิธีนี้จะสร้างโทเค็นที่ซ้ำซ้อนและข้อผิดพลาดในการประมวลผล

สร้างข้อความจากข้อมูลเสียง

ใน Gemma 4 และ Gemma 3n คุณสามารถใช้ข้อมูลเสียงเป็นส่วนหนึ่งของพรอมต์ได้ ส่วนนี้แสดงวิธีใช้ไลบรารี Transformers เพื่อโหลดและกำหนดค่าโมเดล Gemma ให้ใช้ข้อมูลเสียงและอินพุตข้อความเพื่อสร้างเอาต์พุตข้อความ

ใช้เทมเพลตพรอมต์

เมื่อสร้างเนื้อหาด้วยเสียง ให้ใช้เทมเพลตพรอมต์เพื่อจัดโครงสร้างคำขอ เทมเพลตพรอมต์ช่วยให้คุณระบุอินพุตจากบทบาทที่เฉพาะเจาะจง เช่น user หรือ model และเป็นรูปแบบที่จำเป็นสำหรับการจัดการการโต้ตอบการสนทนาหลายรอบกับโมเดล Gemma โค้ดตัวอย่างต่อไปนี้แสดงวิธีสร้างเทมเพลตพรอมต์สำหรับ Gemma ด้วยอินพุตข้อมูลเสียง

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
            {"type": "audio", "audio": "https://ai.google.dev/gemma/docs/audio/roses-are.wav"},
        ]
    }
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'text',
      'text': 'Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.'},
     {'type': 'audio',
      'audio': 'https://ai.google.dev/gemma/docs/audio/roses-are.wav'}]}],
  'generated_text': 'Roses are red, violets are blue.'}]

คุณสามารถใส่ไฟล์เสียงหลายไฟล์ในพรอมต์ได้โดยใส่รายการ "type": "audio", เพิ่มเติมในรายการ content

หมายเหตุ: อย่าใช้ <|audio|> หรือ <audio_soft_token> ในส่วนข้อความของเทมเพลตพรอมต์ เนื่องจากวิธีนี้จะสร้างโทเค็นที่ซ้ำซ้อนและข้อผิดพลาดในการประมวลผล

ขั้นตอนถัดไป

สร้างและสำรวจเพิ่มเติมด้วยโมเดล Gemma ดังนี้