Gemma 4 がリリースされました。テキスト、音声、画像の入力に対応し、最大 256, 000 トークンの長いコンテキストウィンドウを備えています。詳細

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Hugging Face Transformers で Gemma を実行する

Gemma オープンモデルでは、テキストの生成、要約、コンテンツの分析などのタスクを実行できます。このチュートリアルでは、Hugging Face Transformers を使用して Gemma の実行を開始し、テキストと画像の両方の入力を使用してテキストコンテンツを生成する方法について説明します。Transformers Python ライブラリは、Gemma などの事前トレーニング済み生成 AI モデルにアクセスするための API を提供します。詳細については、Transformer のドキュメントをご覧ください。

Python パッケージをインストールする

Gemma モデルの実行とリクエストの作成に必要な Hugging Face ライブラリをインストールします。

# Install Pytorch
%pip install torch

# Install a transformers
%pip install transformers

テキストからテキストを生成する

テキストで Gemma モデルをプロンプトしてテキストレスポンスを取得する方法は、Gemma を使用する最も簡単な方法であり、ほぼすべての Gemma バリアントで機能します。このセクションでは、Hugging Face Transformers ライブラリを使用して、テキストからテキストへの生成用に Gemma モデルを読み込んで構成する方法について説明します。

モデルの読み込み

torch ライブラリと transformers ライブラリを使用して、Gemma でモデル実行 pipeline クラスのインスタンスを作成します。出力を生成したり、指示に従ったりするためにモデルを使用する場合は、指示用調整（IT）モデルを選択します。通常、モデル ID 文字列に it が含まれています。pipeline オブジェクトを使用して、使用する Gemma バリアントと実行するタスクのタイプ（特にマルチモーダル生成の場合は "any-to-any"）を指定します。次のコード例をご覧ください。

from transformers import pipeline

MODEL_ID = "google/gemma-4-E2B-it"

pipe = pipeline(
    task="any-to-any",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

config.json: 0.00B [00:00, ?B/s]
model.safetensors:   0%|          | 0.00/10.2G [00:00<?, ?B/s]
Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/208 [00:00<?, ?B/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json:   0%|          | 0.00/32.2M [00:00<?, ?B/s]

Gemma は、生成用の task 設定をいくつかのみサポートしています。使用可能な task 設定の詳細については、Hugging Face Pipelines の task() ドキュメントをご覧ください。Pipeline クラスの使用方法の詳細については、Hugging Face の Pipelines ドキュメントをご覧ください。

テキスト生成を実行する

Gemma モデルを pipeline オブジェクトに読み込んで構成したら、モデルにプロンプトを送信できます。次のサンプルコードは、text パラメータを使用した基本的なリクエストを示しています。

pipe(text="<|turn>user\nroses are red<turn|>\n<|turn>model\n")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[{'input_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\n',
  'generated_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\nThat\'s a classic phrase, often used to highlight a contrast or a truth.\n\n**"Roses are red"** is a very popular, simple, and sweet arrangement.\n\nWhat would you like to do with this phrase? Are you looking for:\n\n1. **More rhymes or phrases?**\n2. **A continuation of a thought?**\n3. **Just appreciating the simplicity?**'}]

プロンプトテンプレートを使用する

より複雑なプロンプトを使用してコンテンツを生成する場合は、プロンプトテンプレートを使用してリクエストを構造化します。プロンプトテンプレートを使用すると、user や model などの特定のロールからの入力を指定できます。これは、Gemma モデルとのマルチターンチャットインタラクションを管理するために必要な形式です。次のコード例は、Gemma のプロンプトテンプレートを作成する方法を示しています。

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "Roses are red..."}]
    },
]

pipe(messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'system',
    'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]},
   {'role': 'user',
    'content': [{'type': 'text', 'text': 'Roses are red...'}]}],
  'generated_text': 'Roses are red,\nViolets are blue,\nHow lovely to see\nA beautiful view.'}]

画像データからテキストを生成する

Gemma 3 以降では、モデルサイズが 4B 以上の場合は、プロンプトの一部として画像データを使用できます。このセクションでは、Transformers ライブラリを使用して Gemma モデルを読み込んで構成し、画像データとテキスト入力を使用してテキスト出力を生成する方法について説明します。

プロンプトテンプレートを使用する

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg'},
     {'type': 'text', 'text': 'What is shown in this image?'}]},
   {'role': 'assistant',
    'content': [{'type': 'text', 'text': 'This image shows'}]}],
  'generated_text': " a platter of Indian food, likely a meal or an assortment of dishes.\n\nHere's a breakdown of what is visible:\n\n*   **Flatbread:** There is a large, golden-brown flatbread (possibly naan or roti) dominating the center of the platter.\n*   **Dips/Sides:** There are several small bowls containing various accompaniments:\n    *   A bowl of **yellow/mustard-colored dip** (perhaps a chutney or sauce).\n    *   A bowl of **white creamy dip** (like raita or yogurt sauce).\n    *   A portion of **white rice**.\n    *   Several bowls of **curries or sauces** in different colors:\n        *   An **orange/brown curry**.\n        *   A **deep yellow/orange sauce**.\n        *   A **green sauce** (likely a chutney).\n*   **Garnish/Side Item:** In the upper right corner, there appears to be some darker, textured items, possibly fried pieces or spices.\n*   **Platter:** The food is served on a metal platter.\n\nOverall, it looks like a traditional Indian meal setup featuring bread, rice, and various flavorful sauces/curries."}]

content リストに "type": "image", エントリを追加することで、プロンプトに複数の画像を含めることができます。

注: プロンプトテンプレートのテキスト部分で <|image|>、<start_of_image>、<image_soft_token> トークンを使用しないでください。この方法では、冗長なトークンと処理エラーが発生します。

音声データからテキストを生成する

Gemma 4 と Gemma 3n では、プロンプトの一部として音声データを使用できます。このセクションでは、Transformers ライブラリを使用して Gemma モデルを読み込んで構成し、音声データとテキスト入力を使用してテキスト出力を生成する方法について説明します。

プロンプトテンプレートを使用する

音声を含むコンテンツを生成する場合は、プロンプトテンプレートを使用してリクエストを構造化します。プロンプトテンプレートを使用すると、user や model などの特定のロールからの入力を指定できます。これは、Gemma モデルとのマルチターンチャットインタラクションを管理するために必要な形式です。次のコード例は、音声データ入力を使用して Gemma のプロンプトテンプレートを作成する方法を示しています。

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
            {"type": "audio", "audio": "https://ai.google.dev/gemma/docs/audio/roses-are.wav"},
        ]
    }
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'text',
      'text': 'Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.'},
     {'type': 'audio',
      'audio': 'https://ai.google.dev/gemma/docs/audio/roses-are.wav'}]}],
  'generated_text': 'Roses are red, violets are blue.'}]

content リストに "type": "audio", エントリを追加することで、プロンプトに複数の音声ファイルを含めることができます。

注: この方法では冗長なトークンと処理エラーが発生するため、プロンプトテンプレートのテキスト部分で <|audio|> トークンまたは <audio_soft_token> トークンを使用しないでください。

次のステップ

Gemma モデルで構築して詳細を確認する:

Hugging Face Transformers で Gemma を実行する

Python パッケージをインストールする

テキストからテキストを生成する

モデルの読み込み

テキスト生成を実行する

プロンプト テンプレートを使用する

画像データからテキストを生成する

プロンプト テンプレートを使用する

音声データからテキストを生成する

プロンプト テンプレートを使用する

次のステップ

プロンプトテンプレートを使用する

プロンプトテンプレートを使用する

プロンプトテンプレートを使用する