Gemma 4 lançado com entrada de texto, áudio e imagem e janela de contexto longa de até 256 mil tokens! Saiba mais

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Inferência de texto básica do Gemma

Conferir em ai.google.dev

Executar no Google Colab

Executar no Kaggle

Abrir no Vertex AI

Conferir o código-fonte no GitHub

O Gemma é uma família de modelos abertos, leves e de última geração, construídos com a mesma pesquisa e tecnologia usadas nos modelos do Gemini. O Gemma 4 foi projetado para ser a família de modelos abertos mais eficiente do mundo.

Este documento é um guia para realizar a inferência de texto básica com o Gemma 4 usando a biblioteca transformers do Hugging Face. Ele aborda a configuração do ambiente, o carregamento do modelo e vários cenários de geração de texto, incluindo comandos de turno único, conversas multiturno estruturadas e aplicação de instruções do sistema.

Este notebook será executado na GPU T4.

Instalar pacotes Python

Instale as bibliotecas do Hugging Face necessárias para executar o modelo Gemma e fazer solicitações.

# Install PyTorch & other libraries
pip install torch accelerate

# Install the transformers library
pip install transformers

O Dialog é uma biblioteca para manipular e mostrar conversas.

pip install dialog

Carregar modelo

Use a biblioteca transformers para carregar o pipeline.

MODEL_ID = "google/gemma-4-E2B-it" # @param ["google/gemma-4-E2B-it","google/gemma-4-E4B-it", "google/gemma-4-31B-it", "google/gemma-4-26B-A4B-it"]

from transformers import pipeline

txt_pipe = pipeline(
    task="text-generation",
    model=MODEL_ID,
    device_map="auto",
    dtype="auto"
)

Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]

Executar a geração de texto

Depois de carregar e configurar o modelo Gemma em um objeto pipeline, você pode enviar comandos ao modelo. O exemplo de código a seguir mostra uma solicitação básica usando o parâmetro text_inputs:

output = txt_pipe(text_inputs="<|turn>user\nRoses are..<turn|>\n<|turn>model\n")
print(output[0]['generated_text'])

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
<|turn>user
Roses are..<turn|>
<|turn>model
Here are a few ways to complete the phrase "Roses are...":

**Classic/Poetic:**

* **Roses are red.** (The most famous completion, though it usually goes "Roses are red, Violets are blue.")
* **Roses are beautiful.**
* **Roses are fragrant.**

**Simple/Direct:**

* **Roses are lovely.**
* **Roses are soft.**

**If you want a specific tone, let me know! 😊**

Usar a biblioteca Dialog

import dialog
from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512

conv = dialog.Conversation(
    dialog.User("Roses are...")
)
output = txt_pipe(text_inputs=conv.as_text(), return_full_text=False, generation_config=config)
conv += dialog.Model(output[0]['generated_text'])

print(conv.as_text())
conv.show()

<|turn>user
Roses are...<turn|>
<|turn>model
Here are a few ways to complete the phrase "Roses are...":

**Focusing on their beauty:**

* **Roses are beautiful.**
* **Roses are gorgeous.**

**Focusing on their scent:**

* **Roses are fragrant.**
* **Roses are sweet-smelling.**

**Focusing on their symbolism (if you want a deeper meaning):**

* **Roses are love.**
* **Roses are romantic.**

**Focusing on a general observation:**

* **Roses are lovely.**
* **Roses are wonderful.**

**Which completion do you like best, or were you thinking of a specific meaning?**
<dialog._src.widget.Conversation object at 0x7f1bb1a5d8b0>

Usar um modelo de comando

Ao gerar conteúdo com comandos mais complexos, use um modelo de comando para estruturar sua solicitação. Um modelo de comando permite especificar a entrada de papéis específicos, como user ou model, e é um formato obrigatório para gerenciar conversa multiturno com modelos Gemma. O exemplo de código a seguir mostra como construir um modelo de comando para o Gemma:

from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Write a short poem about the Kraken."},
        ]
    }
]

output = txt_pipe(messages, return_full_text=False, generation_config=config)
print(output[0]['generated_text'])

From sunless depths, a shadow stirs,
Where ocean's crushing silence blurs.
A titan sleeps in inky night,
With tentacles of dreadful might.

A hundred arms, a crushing hold,
A legend whispered, ages old.
The deep's dark king, a monstrous grace,
The Kraken claims its watery space.

Conversa com vários turnos

Em uma configuração multiturno, o histórico da conversa é preservado como uma sequência de papéis user e model alternados. Essa lista cumulativa serve como a memória do modelo, garantindo que cada nova saída seja informada pelo diálogo anterior.

import dialog
from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512

# User turn #1
conv = dialog.Conversation(
    dialog.User("Write a short poem about the Kraken.")
)

# Model response #1
output = txt_pipe(text_inputs=conv.as_text(), return_full_text=False, generation_config=config)
conv += dialog.Model(output[0]['generated_text'])

# User turn #2
conv += dialog.User("Now with the Siren.")

# Model response #2
output = txt_pipe(text_inputs=conv.as_text(), return_full_text=False, generation_config=config)
conv += dialog.Model(output[0]['generated_text'])

print(conv.as_text())
conv.show()

<|turn>user
Write a short poem about the Kraken.<turn|>
<|turn>model
In depths where sunlight fades,
A monstrous shadow plays.
The Kraken wakes, with churning tide,
A living horror, bold and wide.<turn|>
<|turn>user
Now with the Siren.<turn|>
<|turn>model
Where coral gardens sleep,
And ocean secrets keep,
The Siren calls, with liquid grace,
A haunting melody in place.
<dialog._src.widget.Conversation object at 0x7f1bac3733b0>

E aqui está a conversa exportada como texto.

Observação: se você definir training=True, a conversa será considerada o exemplo completo. Sempre termina com <turn|>

chat_history = conv.as_text(training=True)
print(chat_history)
print("-"*80)

# display as Conversation widget
chat_history

<|turn>user
Write a short poem about the Kraken.<turn|>
<|turn>model
In depths where sunlight fades,
A monstrous shadow plays.
The Kraken wakes, with churning tide,
A living horror, bold and wide.<turn|>
<|turn>user
Now with the Siren.<turn|>
<|turn>model
Where coral gardens sleep,
And ocean secrets keep,
The Siren calls, with liquid grace,
A haunting melody in place.<turn|>
--------------------------------------------------------------------------------
<dialog._src.widget.ConversationStr object at 0x7f1bb07fa1b0>

Instruções do sistema

Use o papel system para fornecer as instruções no nível do sistema.

import dialog
from transformers import GenerationConfig
config = GenerationConfig.from_pretrained(MODEL_ID)
config.max_new_tokens = 512

conv = dialog.Conversation(
    dialog.System("Speak like a pirate."),
    dialog.User("Why is the sky blue?")
)

output = txt_pipe(text_inputs=conv.as_text(), return_full_text=False, generation_config=config)
conv += dialog.Model(output[0]['generated_text'])

print(conv.as_text())
conv.show()

<|turn>system
Speak like a pirate.<turn|>
<|turn>user
Why is the sky blue?<turn|>
<|turn>model
Ahoy there! Why is the sky blue, ye ask? It be down to the way the sun's light dances through the air!

See, the sunlight we get from the sun ain't just one color; it's a whole spectrum of colors, like a treasure chest filled with all the hues of the rainbow!

Now, the Earth is surrounded by the air, and that air is full of tiny, invisible bits of gas. When the sunlight hits these gas molecules, something magical happens. The colors in that sunlight get scattered all around in every direction!

The blue light, and other colors, get scattered more easily by these air molecules than the other colors. So, when you look up at the sky, your eyes catch all that scattered blue light coming from every direction, and **that's what makes the sky appear blue to us!**

It's a grand display of physics and light, savvy? Now, hoist the colors and enjoy the view!
<dialog._src.widget.Conversation object at 0x7f1bac370110>

Resumo e próximas etapas

Neste guia, você aprendeu a realizar a inferência de texto básica com o Gemma 4 usando a biblioteca transformers do Hugging Face. Você aprendeu o seguinte:

Configurar o ambiente e instalar dependências.
Carregar o modelo usando a abstração pipeline.
Executar a geração de texto básica.
Usar a biblioteca dialog para rastreamento de conversas.
Implementar conversas multiturno e aplicar instruções do sistema.

Inferência de texto básica do Gemma

Instalar pacotes Python

Carregar modelo

Executar a geração de texto

Usar a biblioteca Dialog

Usar um modelo de comando

Conversa com vários turnos

Instruções do sistema

Resumo e próximas etapas

Próximas etapas