‫Gemma 4 הושק עם קלט של טקסט, אודיו ותמונות, וחלון הקשר ארוך של עד 256 אלף טוקנים. מידע נוסף

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

הפעלת Gemma עם Hugging Face Transformers

לצפייה ב-ai.google.dev

יצירת טקסט, סיכום וניתוח תוכן הם רק חלק מהמשימות שאפשר לבצע באמצעות מודלים פתוחים של Gemma. במדריך הזה נסביר איך להתחיל להריץ את Gemma באמצעות Hugging Face Transformers, תוך שימוש בקלט של טקסט ותמונות כדי ליצור תוכן טקסטואלי. ספריית ה-Transformers Python מספקת API לגישה למודלים של AI גנרטיבי שעברו אימון מראש, כולל Gemma. מידע נוסף מופיע במאמר בנושא Transformers.

התקנת חבילות Python

מתקינים את הספריות של Hugging Face שנדרשות להרצת מודל Gemma ולשליחת בקשות.

# Install Pytorch
%pip install torch

# Install a transformers
%pip install transformers

יצירת טקסט מטקסט

הדרך הכי פשוטה להשתמש ב-Gemma היא להזין טקסט להנחיה של מודל Gemma כדי לקבל תגובה טקסטואלית. השיטה הזו פועלת כמעט עם כל הווריאציות של Gemma. בקטע הזה נראה איך להשתמש בספריית Hugging Face Transformers כדי לטעון ולהגדיר מודל Gemma ליצירת טקסט מטקסט.

טעינת מודל

משתמשים בספריות torch ו-transformers כדי ליצור מופע של מחלקת pipeline להרצת מודל עם Gemma. כשמשתמשים במודל כדי ליצור פלט או לבצע הוראות, כדאי לבחור במודל שעבר כוונון להוראות (IT), שבדרך כלל כולל את המחרוזת it במזהה המודל. באמצעות האובייקט pipeline, מציינים את וריאציית Gemma שרוצים להשתמש בה, את סוג המשימה שרוצים לבצע, ובאופן ספציפי "any-to-any" ליצירה מולטימודאלית, כמו שמוצג בדוגמת הקוד הבאה:

from transformers import pipeline

MODEL_ID = "google/gemma-4-E2B-it"

pipe = pipeline(
    task="any-to-any",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

config.json: 0.00B [00:00, ?B/s]
model.safetensors:   0%|          | 0.00/10.2G [00:00<?, ?B/s]
Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/208 [00:00<?, ?B/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json:   0%|          | 0.00/32.2M [00:00<?, ?B/s]

‫Gemma תומך רק בכמה הגדרות של task ליצירה. מידע נוסף על ההגדרות הזמינות של task זמין במאמרי העזרה בנושא Hugging Face Pipelines task(). מידע נוסף על השימוש במחלקת Pipeline זמין במסמכי העזרה של Hugging Face בנושא Pipelines.

הרצת יצירת טקסט

אחרי שמעמיסים את מודל Gemma ומגדירים אותו באובייקט pipeline, אפשר לשלוח הנחיות למודל. הקוד לדוגמה הבא מראה בקשה בסיסית באמצעות הפרמטר text:

pipe(text="<|turn>user\nroses are red<turn|>\n<|turn>model\n")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[{'input_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\n',
  'generated_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\nThat\'s a classic phrase, often used to highlight a contrast or a truth.\n\n**"Roses are red"** is a very popular, simple, and sweet arrangement.\n\nWhat would you like to do with this phrase? Are you looking for:\n\n1. **More rhymes or phrases?**\n2. **A continuation of a thought?**\n3. **Just appreciating the simplicity?**'}]

שימוש בתבנית של הנחיה

כשיוצרים תוכן באמצעות הנחיות מורכבות יותר, כדאי להשתמש בתבנית הנחיה כדי לבנות את הבקשה. תבנית הנחיה מאפשרת לכם לציין קלט מתפקידים ספציפיים, כמו user או model, והיא פורמט חובה לניהול שיחה עם זיכרון עם מודלים של Gemma. בדוגמת הקוד הבאה אפשר לראות איך יוצרים תבנית של הנחיה ל-Gemma:

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "Roses are red..."}]
    },
]

pipe(messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'system',
    'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]},
   {'role': 'user',
    'content': [{'type': 'text', 'text': 'Roses are red...'}]}],
  'generated_text': 'Roses are red,\nViolets are blue,\nHow lovely to see\nA beautiful view.'}]

יצירת טקסט מנתוני תמונה

החל מ-Gemma 3, לגבי גדלי מודלים של 4B ומעלה, אפשר להשתמש בנתוני תמונות כחלק מההנחיה. בקטע הזה מוסבר איך להשתמש בספריית Transformers כדי לטעון ולהגדיר מודל Gemma לשימוש בנתוני תמונות ובקלט טקסט כדי ליצור פלט טקסט.

שימוש בתבנית של הנחיה

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg'},
     {'type': 'text', 'text': 'What is shown in this image?'}]},
   {'role': 'assistant',
    'content': [{'type': 'text', 'text': 'This image shows'}]}],
  'generated_text': " a platter of Indian food, likely a meal or an assortment of dishes.\n\nHere's a breakdown of what is visible:\n\n*   **Flatbread:** There is a large, golden-brown flatbread (possibly naan or roti) dominating the center of the platter.\n*   **Dips/Sides:** There are several small bowls containing various accompaniments:\n    *   A bowl of **yellow/mustard-colored dip** (perhaps a chutney or sauce).\n    *   A bowl of **white creamy dip** (like raita or yogurt sauce).\n    *   A portion of **white rice**.\n    *   Several bowls of **curries or sauces** in different colors:\n        *   An **orange/brown curry**.\n        *   A **deep yellow/orange sauce**.\n        *   A **green sauce** (likely a chutney).\n*   **Garnish/Side Item:** In the upper right corner, there appears to be some darker, textured items, possibly fried pieces or spices.\n*   **Platter:** The food is served on a metal platter.\n\nOverall, it looks like a traditional Indian meal setup featuring bread, rice, and various flavorful sauces/curries."}]

אפשר לכלול כמה תמונות בהנחיה על ידי הוספת עוד רשומות "type": "image", לרשימה content.

הערה: אל תשתמשו בטוקנים <|image|>, <start_of_image> או <image_soft_token> בחלק הטקסטואלי של תבנית הנחיה, כי השימוש בהם יוצר טוקנים מיותרים ושגיאות עיבוד.

יצירת טקסט מנתוני אודיו

עם Gemma 4 ו-Gemma 3n, אתם יכולים להשתמש בנתוני אודיו כחלק מההנחיה. בקטע הזה מוסבר איך להשתמש בספריית Transformers כדי לטעון ולהגדיר מודל Gemma לשימוש בנתוני אודיו ובקלט טקסט כדי ליצור פלט טקסט.

שימוש בתבנית של הנחיה

כשיוצרים תוכן עם אודיו, כדאי להשתמש בתבנית של הנחיה כדי לבנות את הבקשה. תבנית הנחיה מאפשרת לכם לציין קלט מתפקידים ספציפיים, כמו user או model, והיא פורמט חובה לניהול שיחה עם זיכרון עם מודלים של Gemma. בדוגמת הקוד הבאה אפשר לראות איך יוצרים תבנית הנחיה ל-Gemma עם קלט של נתוני אודיו:

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
            {"type": "audio", "audio": "https://ai.google.dev/gemma/docs/audio/roses-are.wav"},
        ]
    }
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'text',
      'text': 'Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.'},
     {'type': 'audio',
      'audio': 'https://ai.google.dev/gemma/docs/audio/roses-are.wav'}]}],
  'generated_text': 'Roses are red, violets are blue.'}]

אפשר לכלול כמה קובצי אודיו בהנחיה על ידי הוספת עוד רשומות "type": "audio", לרשימה content.

הערה: אל תשתמשו בטוקנים <|audio|> או <audio_soft_token> בחלק הטקסטואלי של תבנית הנחיה, כי הגישה הזו יוצרת טוקנים מיותרים ושגיאות עיבוד.

השלבים הבאים

אפשר ליצור ולחקור עוד באמצעות מודלים של Gemma: