텍스트, 오디오, 이미지 입력과 최대 256K의 긴 컨텍스트 창을 지원하는 Gemma 4 가 출시되었습니다. 자세히 알아보기

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Hugging Face Transformers로 Gemma 실행

텍스트 생성, 요약, 콘텐츠 분석은 Gemma 개방형 모델로 수행할 수 있는 작업 중 일부에 불과합니다. 이 가이드에서는 텍스트 및 이미지 입력을 모두 사용하여 텍스트 콘텐츠를 생성하는 Hugging Face Transformer를 사용하여 Gemma를 실행하는 방법을 보여줍니다. Transformer Python 라이브러리는 Gemma를 비롯한 사전 학습된 생성형 AI 모델에 액세스하기 위한 API를 제공합니다. 자세한 내용은 Transformer 문서를 참고하세요.

Python 패키지 설치

Gemma 모델을 실행하고 요청을 보내는 데 필요한 Hugging Face 라이브러리를 설치합니다.

# Install Pytorch
%pip install torch

# Install a transformers
%pip install transformers

텍스트에서 텍스트 생성

텍스트 응답을 얻기 위해 텍스트로 Gemma 모델을 프롬프트하는 것이 Gemma를 사용하는 가장 간단한 방법이며 거의 모든 Gemma 변형에서 작동합니다. 이 섹션에서는 Hugging Face Transformer 라이브러리를 사용하여 텍스트-텍스트 생성을 위한 Gemma 모델을 로드하고 구성하는 방법을 보여줍니다.

모델 로드

torch 및 transformers 라이브러리를 사용하여 Gemma로 모델 실행 pipeline 클래스의 인스턴스를 만듭니다. 출력을 생성하거나 안내를 따르는 데 모델을 사용하는 경우 일반적으로 모델 ID 문자열에 it가 있는 안내 조정 (IT) 모델을 선택합니다. pipeline 객체를 사용하여 다음 코드 예시와 같이 사용할 Gemma 변형, 수행할 작업 유형(특히 멀티모달 생성을 위한 "any-to-any")을 지정합니다.

from transformers import pipeline

MODEL_ID = "google/gemma-4-E2B-it"

pipe = pipeline(
    task="any-to-any",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

config.json: 0.00B [00:00, ?B/s]
model.safetensors:   0%|          | 0.00/10.2G [00:00<?, ?B/s]
Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/208 [00:00<?, ?B/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json:   0%|          | 0.00/32.2M [00:00<?, ?B/s]

Gemma는 생성에 몇 가지 task 설정만 지원합니다. 사용 가능한 task 설정에 관한 자세한 내용은 Hugging Face Pipelines task() 문서를 참고하세요. Pipeline 클래스 사용에 관한 자세한 내용은 Hugging Face Pipelines 문서를 참고하세요.

텍스트 생성 실행

pipeline 객체에 Gemma 모델을 로드하고 구성한 후 모델에 프롬프트를 보낼 수 있습니다. 다음 예시 코드는 text 매개변수를 사용하는 기본 요청을 보여줍니다.

pipe(text="<|turn>user\nroses are red<turn|>\n<|turn>model\n")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[{'input_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\n',
  'generated_text': '<|turn>user\nroses are red<turn|>\n<|turn>model\nThat\'s a classic phrase, often used to highlight a contrast or a truth.\n\n**"Roses are red"** is a very popular, simple, and sweet arrangement.\n\nWhat would you like to do with this phrase? Are you looking for:\n\n1. **More rhymes or phrases?**\n2. **A continuation of a thought?**\n3. **Just appreciating the simplicity?**'}]

프롬프트 템플릿 사용

더 복잡한 프롬프트로 콘텐츠를 생성할 때는 프롬프트 템플릿을 사용하여 요청을 구성합니다. 프롬프트 템플릿을 사용하면 user 또는 model과 같은 특정 역할의 입력을 지정할 수 있으며 Gemma 모델과의 다중 턴 채팅 상호작용을 관리하는 데 필요한 형식입니다. 다음 예시 코드는 Gemma의 프롬프트 템플릿을 구성하는 방법을 보여줍니다.

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "Roses are red..."}]
    },
]

pipe(messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'system',
    'content': [{'type': 'text', 'text': 'You are a helpful assistant.'}]},
   {'role': 'user',
    'content': [{'type': 'text', 'text': 'Roses are red...'}]}],
  'generated_text': 'Roses are red,\nViolets are blue,\nHow lovely to see\nA beautiful view.'}]

이미지 데이터에서 텍스트 생성

Gemma 3부터 모델 크기가 4B 이상인 경우 이미지 데이터를 프롬프트의 일부로 사용할 수 있습니다. 이 섹션에서는 Transformer 라이브러리를 사용하여 이미지 데이터와 텍스트 입력을 사용하여 텍스트 출력을 생성하도록 Gemma 모델을 로드하고 구성하는 방법을 보여줍니다.

프롬프트 템플릿 사용

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "This image shows"},
        ],
    },
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'image',
      'url': 'https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg'},
     {'type': 'text', 'text': 'What is shown in this image?'}]},
   {'role': 'assistant',
    'content': [{'type': 'text', 'text': 'This image shows'}]}],
  'generated_text': " a platter of Indian food, likely a meal or an assortment of dishes.\n\nHere's a breakdown of what is visible:\n\n*   **Flatbread:** There is a large, golden-brown flatbread (possibly naan or roti) dominating the center of the platter.\n*   **Dips/Sides:** There are several small bowls containing various accompaniments:\n    *   A bowl of **yellow/mustard-colored dip** (perhaps a chutney or sauce).\n    *   A bowl of **white creamy dip** (like raita or yogurt sauce).\n    *   A portion of **white rice**.\n    *   Several bowls of **curries or sauces** in different colors:\n        *   An **orange/brown curry**.\n        *   A **deep yellow/orange sauce**.\n        *   A **green sauce** (likely a chutney).\n*   **Garnish/Side Item:** In the upper right corner, there appears to be some darker, textured items, possibly fried pieces or spices.\n*   **Platter:** The food is served on a metal platter.\n\nOverall, it looks like a traditional Indian meal setup featuring bread, rice, and various flavorful sauces/curries."}]

프롬프트에 추가 "type": "image", 항목을 포함하여 content 목록에 여러 이미지를 포함할 수 있습니다.

참고: 이 접근 방식은 중복 토큰과 처리 오류를 생성하므로 프롬프트 템플릿의 텍스트 부분에서 <|image|>, <start_of_image> 또는 <image_soft_token> 토큰을 사용하지 마세요.

오디오 데이터에서 텍스트 생성

Gemma 4 및 Gemma 3n을 사용하면 오디오 데이터를 프롬프트의 일부로 사용할 수 있습니다. 이 섹션에서는 Transformer 라이브러리를 사용하여 오디오 데이터와 텍스트 입력을 사용하여 텍스트 출력을 생성하도록 Gemma 모델을 로드하고 구성하는 방법을 보여줍니다.

프롬프트 템플릿 사용

오디오로 콘텐츠를 생성할 때는 프롬프트 템플릿을 사용하여 요청을 구성합니다. 프롬프트 템플릿을 사용하면 user 또는 model과 같은 특정 역할의 입력을 지정할 수 있으며 Gemma 모델과의 다중 턴 채팅 상호작용을 관리하는 데 필요한 형식입니다. 다음 예시 코드는 오디오 데이터 입력으로 Gemma의 프롬프트 템플릿을 구성하는 방법을 보여줍니다.

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512
gen_kwargs = dict(generation_config=config)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
            {"type": "audio", "audio": "https://ai.google.dev/gemma/docs/audio/roses-are.wav"},
        ]
    }
]

pipe(text=messages, return_full_text=False, generate_kwargs=gen_kwargs)

[{'input_text': [{'role': 'user',
    'content': [{'type': 'text',
      'text': 'Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.'},
     {'type': 'audio',
      'audio': 'https://ai.google.dev/gemma/docs/audio/roses-are.wav'}]}],
  'generated_text': 'Roses are red, violets are blue.'}]

`content` 목록에 추가 `"type": "audio",` 항목을 포함하여 프롬프트에 여러 오디오 파일을 포함할 수 있습니다.

참고: 이 접근 방식은 중복 토큰과 처리 오류를 생성하므로 프롬프트 템플릿의 텍스트 부분에서 <|audio|> 또는 <audio_soft_token> 토큰을 사용하지 마세요.

다음 단계

Gemma 모델로 더 많은 빌드 및 탐색을 수행합니다.