Gemma 4 推出，支援文字、音訊和圖片輸入，脈絡窗口最長可達 25.6 萬個詞元！瞭解詳情

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

使用 Hugging Face Transformers 執行 Gemma

Gemma 開放模型可執行的工作包括生成文字、摘要和分析內容。本教學課程說明如何開始使用 Hugging Face Transformers 執行 Gemma，並使用文字和圖片輸入內容生成文字內容。Transformers Python 程式庫提供 API，可存取預先訓練的生成式 AI 模型，包括 Gemma。詳情請參閱 Transformers 說明文件。

安裝 Python 套件

安裝執行 Gemma 模型及發出要求時所需的 Hugging Face 程式庫。

# Install Pytorch
%pip install torch

# Install a transformers
%pip install transformers

從文字生成文字

提示 Gemma 模型提供文字，藉此取得文字回應，是使用 Gemma 最簡單的方式，而且適用於幾乎所有 Gemma 變體。本節說明如何使用 Hugging Face Transformers 程式庫載入及設定 Gemma 模型，以生成文字。

載入模型

使用 torch 和 transformers 程式庫，透過 Gemma 建立模型執行 pipeline 類別的例項。如要使用模型生成輸出內容或遵循指示，請選取指令微調 (IT) 模型，模型 ID 字串中通常會包含 it。使用 pipeline 物件指定要使用的 Gemma 變體、要執行的工作類型 (具體來說，是多模態生成作業的 "any-to-any")，如以下程式碼範例所示：

from transformers import pipeline

MODEL_ID = "google/gemma-4-E2B-it"

pipe = pipeline(
    task="any-to-any",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

config.json: 0.00B [00:00, ?B/s]
model.safetensors:   0%|          | 0.00/10.2G [00:00<?, ?B/s]
Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/208 [00:00<?, ?B/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json:   0%|          | 0.00/32.2M [00:00<?, ?B/s]

Gemma 僅支援少數生成設定 task，如要進一步瞭解可用的 task 設定，請參閱 Hugging Face Pipelines task() 說明文件。如要進一步瞭解如何使用 Pipeline 類別，請參閱 Hugging Face 的「Pipelines」說明文件。

執行文字生成

在 pipeline 物件中載入及設定 Gemma 模型後，即可將提示傳送至模型。下列程式碼範例顯示使用 text 參數的基本要求：

pipe(text="<|turn>user\nroses are red<turn|>\n<|turn>model\n")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[{'input_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\n',
  'generated_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\nThat\'s a classic phrase, often used to highlight a contrast or a truth.\n\n**"Roses are red"** is a very popular, simple, and sweet arrangement.\n\nWhat would you like to do with this phrase? Are you looking for:\n\n1. **More rhymes or phrases?**\n2. **A continuation of a thought?**\n3. **Just appreciating the simplicity?**'}]

使用提示範本

如果使用更複雜的提示生成內容，請使用提示範本來建構要求。提示範本可讓您指定特定角色 (例如 user 或 model) 的輸入內容，也是管理與 Gemma 模型多輪對話互動的必要格式。以下程式碼範例說明如何建構 Gemma 的提示範本：

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "Roses are red..."}]
    },
]

pipe(messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'system',
    'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]},
   {'role': 'user',
    'content': [{'type': 'text', 'text': 'Roses are red...'}]}],
  'generated_text': 'Roses are red,\nViolets are blue,\nHow lovely to see\nA beautiful view.'}]

從圖片資料生成文字

從 Gemma 3 開始，對於 4B 以上的模型大小，您可以在提示中加入圖片資料。本節說明如何使用 Transformers 程式庫載入及設定 Gemma 模型，以便使用圖片資料和文字輸入內容生成文字輸出內容。

使用提示範本

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg'},
     {'type': 'text', 'text': 'What is shown in this image?'}]},
   {'role': 'assistant',
    'content': [{'type': 'text', 'text': 'This image shows'}]}],
  'generated_text': " a platter of Indian food, likely a meal or an assortment of dishes.\n\nHere's a breakdown of what is visible:\n\n*   **Flatbread:** There is a large, golden-brown flatbread (possibly naan or roti) dominating the center of the platter.\n*   **Dips/Sides:** There are several small bowls containing various accompaniments:\n    *   A bowl of **yellow/mustard-colored dip** (perhaps a chutney or sauce).\n    *   A bowl of **white creamy dip** (like raita or yogurt sauce).\n    *   A portion of **white rice**.\n    *   Several bowls of **curries or sauces** in different colors:\n        *   An **orange/brown curry**.\n        *   A **deep yellow/orange sauce**.\n        *   A **green sauce** (likely a chutney).\n*   **Garnish/Side Item:** In the upper right corner, there appears to be some darker, textured items, possibly fried pieces or spices.\n*   **Platter:** The food is served on a metal platter.\n\nOverall, it looks like a traditional Indian meal setup featuring bread, rice, and various flavorful sauces/curries."}]

如要在提示中加入多張圖片，請在 content 清單中加入其他 "type": "image", 項目。

注意： 請勿在提示範本的文字部分使用 <|image|>、<start_of_image> 或 <image_soft_token> 符記，因為這種做法會建立多餘的符記，並導致處理錯誤。

從音訊資料生成文字

Gemma 4 和 Gemma 3n 支援在提示中使用音訊資料。本節說明如何使用 Transformers 程式庫載入及設定 Gemma 模型，以便使用音訊資料和文字輸入內容生成文字輸出內容。

使用提示範本

使用提示範本來建構要求，生成含音訊的內容。提示範本可讓您指定特定角色 (例如 user 或 model) 的輸入內容，也是管理與 Gemma 模型多輪對話互動的必要格式。以下程式碼範例說明如何使用音訊資料輸入，為 Gemma 建構提示範本：

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
            {"type": "audio", "audio": "https://ai.google.dev/gemma/docs/audio/roses-are.wav"},
        ]
    }
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'text',
      'text': 'Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.'},
     {'type': 'audio',
      'audio': 'https://ai.google.dev/gemma/docs/audio/roses-are.wav'}]}],
  'generated_text': 'Roses are red, violets are blue.'}]

如要在提示中加入多個音訊檔案，請在 content 清單中加入其他 "type": "audio", 項目。

注意： 請勿在提示範本的文字部分使用 <|audio|> 或 <audio_soft_token> 詞元，因為這樣會建立多餘的詞元，並導致處理錯誤。

後續步驟

運用 Gemma 模型建構及探索更多內容：