|
|
Executar no Google Colab
|
|
|
Ver código-fonte no GitHub
|
O Gemma é uma família de modelos abertos, leves e de última geração, criados com base na mesma pesquisa e tecnologia usadas para criar os modelos do Gemini. O Gemma 4 foi projetado para ser a família de modelos de peso aberto mais eficiente do mundo.
Este documento demonstra como usar as capacidades de raciocínio do Gemma 4 para gerar processos de raciocínio antes de fornecer uma resposta final. Você vai aprender a ativar o modo de pensamento para tarefas somente de texto e multimodais (imagem-texto) usando a biblioteca transformers do Hugging Face e a analisar a saída para separar o pensamento da resposta.
Este notebook será executado em uma GPU T4.
Instalar pacotes Python
Instale as bibliotecas do Hugging Face necessárias para executar o modelo Gemma e fazer solicitações.
# Install PyTorch & other librariespip install torch accelerate# Install the transformers librarypip install transformers
Carregar modelo
Use as bibliotecas transformers para criar uma instância de processor e model usando as classes AutoProcessor e AutoModelForImageTextToText, conforme mostrado no exemplo de código a seguir:
MODEL_ID = "google/gemma-4-E2B-it" # @param ["google/gemma-4-E2B-it","google/gemma-4-E4B-it", "google/gemma-4-31B-it", "google/gemma-4-26B-A4B-it"]
from transformers import AutoProcessor, AutoModelForMultimodalLM
model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)
Loading weights: 0%| | 0/2011 [00:00<?, ?it/s]
Uma única inferência de texto com o Raciocínio
Para gerar uma resposta usando os recursos de raciocínio do modelo, transmita enable_thinking=True. O processador vai inserir os tokens de raciocínio corretos no comando, instruindo o modelo a pensar antes de responder.
| Tamanho do modelo | Estado de raciocínio | Estrutura / saída do modelo |
|---|---|---|
| E2B/E4B | DESATIVADO | <|turn>user\n[Prompt]<turn|>\n<|turn>model |
| E2B/E4B | ATIVADO | <|turn>system\n<|think|><turn|>\n<|turn>user\n[Prompt]<turn|>\n<|turn>model |
| 26B/31B | DESATIVADO | ⚠️ <|turn>user\n[Prompt]<turn|>\n<|turn>model\n<|channel>thought\n<channel|> |
| 26B/31B | ATIVADO | <|turn>system\n<|think|><turn|>\n<|turn>user\n[Prompt]<turn|>\n<|turn>model |
from transformers import TextStreamer
message = [
{
"role": "user", "content": "What is the water formula?"
}
]
text = processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]
streamer = TextStreamer(processor)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=1024)
<bos><|turn>system
<|think|><turn|>
<|turn>user
What is the water formula?<turn|>
<|turn>model
<|channel>thought
Thinking Process:
1. **Analyze the Request:** The user is asking, "What is the water formula?"
2. **Interpret the Ambiguity:** The term "water formula" is highly ambiguous. It could refer to several things:
* **Chemical Formula:** \\(\text{H}_2\text{O}\\) (the molecular formula).
* **Physical/Scientific Formula:** The properties of water (e.g., polarity, hydrogen bonding, specific heat).
* **Biological/Chemical Formula:** The role of water in biochemistry (e.g., solvation, hydration shells).
* **Mathematical Formula:** Equations relating to water cycles, thermodynamics, or fluid dynamics.
* **Culinary/Recipe Formula:** A specific recipe.
* **A specific context missing:** Is this related to chemistry, physics, biology, engineering, or something else?
3. **Determine the Most Likely Interpretation:** In a general knowledge context, when someone asks for a "formula" related to water, they most often mean the fundamental chemical structure.
4. **Formulate the Primary Answer (Chemical):** State the chemical formula and briefly explain what it represents.
5. **Address the Ambiguity (Provide Context/Alternatives):** Since the request is vague, offer other possible interpretations so the user can clarify.
6. **Review Constraints (Self-Correction/Identity Check):** I am Gemma 4, an LLM. I can provide factual information based on my knowledge base. No tools are needed. The answer should be clear and comprehensive.
7. **Draft the Response Structure:**
* Start with the most direct answer (Chemical Formula).
* Explain the components and bonding.
* Offer context for other possible meanings. (Drafting the final response based on this structure.)<channel|>The term "water formula" can be interpreted in several ways, depending on the context (chemistry, physics, biology, etc.).
Here are the most common interpretations:
### 1. Chemical Formula (The Most Common Answer)
The fundamental chemical formula for water is:
$$\text{H}_2\text{O}$$
* **H** represents the element **Hydrogen**.
* **O** represents the element **Oxygen**.
This formula indicates that one molecule of water is composed of two hydrogen atoms covalently bonded to one oxygen atom.
### 2. Molecular Structure and Bonding
Beyond the formula, the "formula" also describes how the atoms interact:
* **Polarity:** Water is a highly **polar** molecule. Oxygen is much more electronegative than hydrogen, meaning it pulls the shared electrons closer to itself. This creates a partial negative charge (\\(\delta^-\\)) on the oxygen atom and partial positive charges (\\(\delta^+\\)) on the hydrogen atoms.
* **Hydrogen Bonding:** The polarity allows water molecules to form **hydrogen bonds** with each other. This strong attraction is responsible for water's unique physical properties, such as its high specific heat, its ability to dissolve many substances (its role as a universal solvent), and its high surface tension.
### 3. Formula in Physics/Thermodynamics
If you are referring to a physical formula, it might relate to:
* **Specific Heat Capacity:** The amount of energy required to raise the temperature of a given mass of water by one degree.
* **Density and Volume:** Equations relating the mass, volume, and density of water under different temperatures and pressures.
***
**In summary, if you are asking for the basic chemical makeup, the formula is \\(\text{H}_2\text{O}\\).**
If you are looking for a specific formula in a different field (like a mathematical equation or a biological reaction), please provide more context!<turn|>
Depois que o texto é gerado, a resposta contém os blocos de raciocínio e a resposta final delimitada por tokens especiais. Você pode usar o utilitário parse_response para extrair facilmente esses dados em um dicionário que contém thinking e answer.
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
result = processor.parse_response(response)
for key, value in result.items():
if key == "role":
print(f"Role: {value}")
elif key == "thinking":
print(f"\n=== Thoughts ===\n{value}")
elif key == "content":
print(f"\n=== Answer ===\n{value}")
elif key == "tool_calls":
print(f"\n=== Tool Calls ===\n{value}")
else:
print(f"\n{key}: {value}...\n")
Role: assistant
=== Thoughts ===
Thinking Process:
1. **Analyze the Request:** The user is asking, "What is the water formula?"
2. **Interpret the Ambiguity:** The term "water formula" is highly ambiguous. It could refer to several things:
* **Chemical Formula:** \\(\text{H}_2\text{O}\\) (the molecular formula).
* **Physical/Scientific Formula:** The properties of water (e.g., polarity, hydrogen bonding, specific heat).
* **Biological/Chemical Formula:** The role of water in biochemistry (e.g., solvation, hydration shells).
* **Mathematical Formula:** Equations relating to water cycles, thermodynamics, or fluid dynamics.
* **Culinary/Recipe Formula:** A specific recipe.
* **A specific context missing:** Is this related to chemistry, physics, biology, engineering, or something else?
3. **Determine the Most Likely Interpretation:** In a general knowledge context, when someone asks for a "formula" related to water, they most often mean the fundamental chemical structure.
4. **Formulate the Primary Answer (Chemical):** State the chemical formula and briefly explain what it represents.
5. **Address the Ambiguity (Provide Context/Alternatives):** Since the request is vague, offer other possible interpretations so the user can clarify.
6. **Review Constraints (Self-Correction/Identity Check):** I am Gemma 4, an LLM. I can provide factual information based on my knowledge base. No tools are needed. The answer should be clear and comprehensive.
7. **Draft the Response Structure:**
* Start with the most direct answer (Chemical Formula).
* Explain the components and bonding.
* Offer context for other possible meanings. (Drafting the final response based on this structure.)
=== Answer ===
The term "water formula" can be interpreted in several ways, depending on the context (chemistry, physics, biology, etc.).
Here are the most common interpretations:
### 1. Chemical Formula (The Most Common Answer)
The fundamental chemical formula for water is:
$$\text{H}_2\text{O}$$
* **H** represents the element **Hydrogen**.
* **O** represents the element **Oxygen**.
This formula indicates that one molecule of water is composed of two hydrogen atoms covalently bonded to one oxygen atom.
### 2. Molecular Structure and Bonding
Beyond the formula, the "formula" also describes how the atoms interact:
* **Polarity:** Water is a highly **polar** molecule. Oxygen is much more electronegative than hydrogen, meaning it pulls the shared electrons closer to itself. This creates a partial negative charge (\\(\delta^-\\)) on the oxygen atom and partial positive charges (\\(\delta^+\\)) on the hydrogen atoms.
* **Hydrogen Bonding:** The polarity allows water molecules to form **hydrogen bonds** with each other. This strong attraction is responsible for water's unique physical properties, such as its high specific heat, its ability to dissolve many substances (its role as a universal solvent), and its high surface tension.
### 3. Formula in Physics/Thermodynamics
If you are referring to a physical formula, it might relate to:
* **Specific Heat Capacity:** The amount of energy required to raise the temperature of a given mass of water by one degree.
* **Density and Volume:** Equations relating the mass, volume, and density of water under different temperatures and pressures.
***
**In summary, if you are asking for the basic chemical makeup, the formula is \\(\text{H}_2\text{O}\\).**
If you are looking for a specific formula in a different field (like a mathematical equation or a biological reaction), please provide more context!
Uma inferência de imagem única
O procedimento para usar o modelo de pensamento com dados visuais é muito semelhante. Você pode fornecer uma imagem como parte da matriz messages. Basta passar a imagem para o processador junto com o texto formatado, e o modelo vai analisar a entrada visual antes de responder.
from PIL import Image
import matplotlib.pyplot as plt
prompt = "What is shown in this image?"
image_url = "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"
# download image
!wget -q {image_url} -O image.png
image = Image.open("image.png")
# Display all images
print("=== Downloaded image ===")
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(image)
ax.set_title("Image 1")
ax.axis("off")
plt.tight_layout()
plt.show()
message = [
{
"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": prompt}
]
}
]
text = processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=1024)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
result = processor.parse_response(response)
for key, value in result.items():
if key == "role":
print(f"Role: {value}")
elif key == "thinking":
print(f"\n=== Thoughts ===\n{value}")
elif key == "content":
print(f"\n=== Answer ===\n{value}")
elif key == "tool_calls":
print(f"\n=== Tool Calls ===\n{value}")
else:
print(f"\n{key}: {value}...\n")
=== Downloaded image ===

Role: assistant
=== Thoughts ===
Here's a thinking process to arrive at the suggested description:
1. **Analyze the Image Content:**
* **Dominant Feature:** A large suspension bridge with distinctive red/orange trusswork. This is immediately recognizable as the Golden Gate Bridge.
* **Setting:** Water (a large body of water, likely the bay/ocean).
* **Foreground/Midground:**
* Water in the immediate foreground (dark blue/green).
* A rocky outcrop/island in the middle foreground.
* A substantial stone/brick structure to the left (part of the bridge approach or a related structure).
* **Background:** Hazy landmasses/hills behind the bridge.
* **Sky:** Clear, light blue sky.
* **Atmosphere/Lighting:** Bright daylight, clear weather.
2. **Identify Key Elements for Description:**
* The Bridge (Golden Gate Bridge).
* The Water/Bay.
* The Coastal/Land features.
3. **Draft the Description (Focusing on detail and clarity):**
* *Initial thought:* It's a picture of the Golden Gate Bridge over the water.
* *Refinement (Adding detail):* The image shows the iconic red suspension bridge spanning a body of water. There is a rocky island in the foreground and some structures on the shore.
* *Enhancement (Adding context and visual appeal):* Mention the color, the scale, and the atmosphere.
4. **Final Polish and Structure (Grouping similar ideas):** (This leads to the final structured response.)
* *Identification:* State clearly what the main subject is.
* *Setting:* Describe the environment (water, sky).
* *Details:* Mention specific foreground and background elements.
5. **Review against the original prompt:** (The prompt asks "What is shown in this image?") The description accurately reflects the visual evidence. (Self-Correction: Ensure the identification is confident, which it is, based on the structure and color.)
=== Answer ===
This image shows the **Golden Gate Bridge** spanning a body of water, likely the San Francisco Bay.
Here is a breakdown of what is visible:
* **The Golden Gate Bridge:** The iconic red/orange suspension bridge dominates the frame, stretching across the water. Its distinctive structure and massive towers are clearly visible.
* **Water:** A large expanse of blue-green water fills the foreground and midground.
* **Foreground Elements:** In the immediate foreground, there is a dark, rocky outcrop or small island.
* **Shoreline/Structures:** To the left, there are stone and brick structures, suggesting the land or approach to the bridge.
* **Background:** Hazy hills or landmasses are visible in the distance behind the bridge.
* **Atmosphere:** The scene is brightly lit under a clear, light blue sky, suggesting fair weather.
In summary, it is a scenic photograph capturing the majestic view of the Golden Gate Bridge.
Resumo e próximas etapas
Neste guia, você aprendeu a usar as capacidades de raciocínio dos modelos Gemma 4 para gerar processos de raciocínio antes das respostas finais. Você aprendeu o seguinte:
- Ativar o modo de pensamento usando
enable_thinking=Trueemapply_chat_template. - Usando
TextStreamerpara observar o processo de análise em tempo real. - Analisar a saída combinada em blocos separados de
thinkingeanswerusandoparse_response. - Aplicar recursos de pensamento a tarefas multimodais (imagem + texto).
Próximas etapas
Descoberta avançada de mais recursos do Gemma 4:
Executar no Google Colab
Ver código-fonte no GitHub