|
|
在 Google Colab 中运行
|
|
|
在 GitHub 上查看源代码
|
Gemma 是一系列先进的轻量级开放式模型,其开发采用了与 Gemini 模型相同的研究成果和技术。Gemma 4 旨在成为世界上最高效的开放式权重模型系列。
本文档演示了如何使用 Gemma 4 的推理功能在提供最终答案之前生成推理过程。您将学习如何使用 Hugging Face transformers 库为纯文本任务和多模态(图片-文本)任务启用思考模式,以及如何解析输出以将思考过程与答案分开。
此笔记本将在 T4 GPU 上运行。
安装 Python 软件包
安装运行 Gemma 模型和发出请求所需的 Hugging Face 库。
# Install PyTorch & other librariespip install torch accelerate# Install the transformers librarypip install transformers
加载模型
使用 transformers 库,通过 AutoProcessor 和 AutoModelForImageTextToText 类创建 processor 和 model 的实例,如以下代码示例所示:
MODEL_ID = "google/gemma-4-E2B-it" # @param ["google/gemma-4-E2B-it","google/gemma-4-E4B-it", "google/gemma-4-31B-it", "google/gemma-4-26B-A4B-it"]
from transformers import AutoProcessor, AutoModelForMultimodalLM
model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)
Loading weights: 0%| | 0/2011 [00:00<?, ?it/s]
一次包含“思考”的文本推理
如需使用模型的思考能力生成回答,请传递 enable_thinking=True,处理器会将正确的思考令牌插入提示中,指示模型在回答之前进行思考。
| 模型大小 | 思考状态 | 模板结构 / 输出 |
|---|---|---|
| E2B/E4B | 已停用 | <|turn>user\n[Prompt]<turn|>\n<|turn>model |
| E2B/E4B | 开启 | <|turn>system\n<|think|><turn|>\n<|turn>user\n[Prompt]<turn|>\n<|turn>model |
| 26B/31B | 已停用 | ⚠️ <|turn>user\n[Prompt]<turn|>\n<|turn>model\n<|channel>thought\n<channel|> |
| 26B/31B | 开启 | <|turn>system\n<|think|><turn|>\n<|turn>user\n[Prompt]<turn|>\n<|turn>model |
from transformers import TextStreamer
message = [
{
"role": "user", "content": "What is the water formula?"
}
]
text = processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]
streamer = TextStreamer(processor)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=1024)
<bos><|turn>system
<|think|><turn|>
<|turn>user
What is the water formula?<turn|>
<|turn>model
<|channel>thought
Thinking Process:
1. **Analyze the Request:** The user is asking, "What is the water formula?"
2. **Interpret the Ambiguity:** The term "water formula" is highly ambiguous. It could refer to several things:
* **Chemical Formula:** \\(\text{H}_2\text{O}\\) (the molecular formula).
* **Physical/Scientific Formula:** The properties of water (e.g., polarity, hydrogen bonding, specific heat).
* **Biological/Chemical Formula:** The role of water in biochemistry (e.g., solvation, hydration shells).
* **Mathematical Formula:** Equations relating to water cycles, thermodynamics, or fluid dynamics.
* **Culinary/Recipe Formula:** A specific recipe.
* **A specific context missing:** Is this related to chemistry, physics, biology, engineering, or something else?
3. **Determine the Most Likely Interpretation:** In a general knowledge context, when someone asks for a "formula" related to water, they most often mean the fundamental chemical structure.
4. **Formulate the Primary Answer (Chemical):** State the chemical formula and briefly explain what it represents.
5. **Address the Ambiguity (Provide Context/Alternatives):** Since the request is vague, offer other possible interpretations so the user can clarify.
6. **Review Constraints (Self-Correction/Identity Check):** I am Gemma 4, an LLM. I can provide factual information based on my knowledge base. No tools are needed. The answer should be clear and comprehensive.
7. **Draft the Response Structure:**
* Start with the most direct answer (Chemical Formula).
* Explain the components and bonding.
* Offer context for other possible meanings. (Drafting the final response based on this structure.)<channel|>The term "water formula" can be interpreted in several ways, depending on the context (chemistry, physics, biology, etc.).
Here are the most common interpretations:
### 1. Chemical Formula (The Most Common Answer)
The fundamental chemical formula for water is:
$$\text{H}_2\text{O}$$
* **H** represents the element **Hydrogen**.
* **O** represents the element **Oxygen**.
This formula indicates that one molecule of water is composed of two hydrogen atoms covalently bonded to one oxygen atom.
### 2. Molecular Structure and Bonding
Beyond the formula, the "formula" also describes how the atoms interact:
* **Polarity:** Water is a highly **polar** molecule. Oxygen is much more electronegative than hydrogen, meaning it pulls the shared electrons closer to itself. This creates a partial negative charge (\\(\delta^-\\)) on the oxygen atom and partial positive charges (\\(\delta^+\\)) on the hydrogen atoms.
* **Hydrogen Bonding:** The polarity allows water molecules to form **hydrogen bonds** with each other. This strong attraction is responsible for water's unique physical properties, such as its high specific heat, its ability to dissolve many substances (its role as a universal solvent), and its high surface tension.
### 3. Formula in Physics/Thermodynamics
If you are referring to a physical formula, it might relate to:
* **Specific Heat Capacity:** The amount of energy required to raise the temperature of a given mass of water by one degree.
* **Density and Volume:** Equations relating the mass, volume, and density of water under different temperatures and pressures.
***
**In summary, if you are asking for the basic chemical makeup, the formula is \\(\text{H}_2\text{O}\\).**
If you are looking for a specific formula in a different field (like a mathematical equation or a biological reaction), please provide more context!<turn|>
生成文本后,响应将包含推理块和由特殊 token 界定的最终答案。您可以使用 parse_response 实用程序轻松将它们提取到包含 thinking 和 answer 的字典中。
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
result = processor.parse_response(response)
for key, value in result.items():
if key == "role":
print(f"Role: {value}")
elif key == "thinking":
print(f"\n=== Thoughts ===\n{value}")
elif key == "content":
print(f"\n=== Answer ===\n{value}")
elif key == "tool_calls":
print(f"\n=== Tool Calls ===\n{value}")
else:
print(f"\n{key}: {value}...\n")
Role: assistant
=== Thoughts ===
Thinking Process:
1. **Analyze the Request:** The user is asking, "What is the water formula?"
2. **Interpret the Ambiguity:** The term "water formula" is highly ambiguous. It could refer to several things:
* **Chemical Formula:** \\(\text{H}_2\text{O}\\) (the molecular formula).
* **Physical/Scientific Formula:** The properties of water (e.g., polarity, hydrogen bonding, specific heat).
* **Biological/Chemical Formula:** The role of water in biochemistry (e.g., solvation, hydration shells).
* **Mathematical Formula:** Equations relating to water cycles, thermodynamics, or fluid dynamics.
* **Culinary/Recipe Formula:** A specific recipe.
* **A specific context missing:** Is this related to chemistry, physics, biology, engineering, or something else?
3. **Determine the Most Likely Interpretation:** In a general knowledge context, when someone asks for a "formula" related to water, they most often mean the fundamental chemical structure.
4. **Formulate the Primary Answer (Chemical):** State the chemical formula and briefly explain what it represents.
5. **Address the Ambiguity (Provide Context/Alternatives):** Since the request is vague, offer other possible interpretations so the user can clarify.
6. **Review Constraints (Self-Correction/Identity Check):** I am Gemma 4, an LLM. I can provide factual information based on my knowledge base. No tools are needed. The answer should be clear and comprehensive.
7. **Draft the Response Structure:**
* Start with the most direct answer (Chemical Formula).
* Explain the components and bonding.
* Offer context for other possible meanings. (Drafting the final response based on this structure.)
=== Answer ===
The term "water formula" can be interpreted in several ways, depending on the context (chemistry, physics, biology, etc.).
Here are the most common interpretations:
### 1. Chemical Formula (The Most Common Answer)
The fundamental chemical formula for water is:
$$\text{H}_2\text{O}$$
* **H** represents the element **Hydrogen**.
* **O** represents the element **Oxygen**.
This formula indicates that one molecule of water is composed of two hydrogen atoms covalently bonded to one oxygen atom.
### 2. Molecular Structure and Bonding
Beyond the formula, the "formula" also describes how the atoms interact:
* **Polarity:** Water is a highly **polar** molecule. Oxygen is much more electronegative than hydrogen, meaning it pulls the shared electrons closer to itself. This creates a partial negative charge (\\(\delta^-\\)) on the oxygen atom and partial positive charges (\\(\delta^+\\)) on the hydrogen atoms.
* **Hydrogen Bonding:** The polarity allows water molecules to form **hydrogen bonds** with each other. This strong attraction is responsible for water's unique physical properties, such as its high specific heat, its ability to dissolve many substances (its role as a universal solvent), and its high surface tension.
### 3. Formula in Physics/Thermodynamics
If you are referring to a physical formula, it might relate to:
* **Specific Heat Capacity:** The amount of energy required to raise the temperature of a given mass of water by one degree.
* **Density and Volume:** Equations relating the mass, volume, and density of water under different temperatures and pressures.
***
**In summary, if you are asking for the basic chemical makeup, the formula is \\(\text{H}_2\text{O}\\).**
If you are looking for a specific formula in a different field (like a mathematical equation or a biological reaction), please provide more context!
单张图片推理
将思考模型与视觉数据搭配使用的流程非常相似。您可以提供作为 messages 数组一部分的图片。只需确保将图片与格式化文本一起传递给处理器,模型就会在回答之前对视觉输入进行推理。
from PIL import Image
import matplotlib.pyplot as plt
prompt = "What is shown in this image?"
image_url = "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"
# download image
!wget -q {image_url} -O image.png
image = Image.open("image.png")
# Display all images
print("=== Downloaded image ===")
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(image)
ax.set_title("Image 1")
ax.axis("off")
plt.tight_layout()
plt.show()
message = [
{
"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": prompt}
]
}
]
text = processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=1024)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
result = processor.parse_response(response)
for key, value in result.items():
if key == "role":
print(f"Role: {value}")
elif key == "thinking":
print(f"\n=== Thoughts ===\n{value}")
elif key == "content":
print(f"\n=== Answer ===\n{value}")
elif key == "tool_calls":
print(f"\n=== Tool Calls ===\n{value}")
else:
print(f"\n{key}: {value}...\n")
=== Downloaded image ===

Role: assistant
=== Thoughts ===
Here's a thinking process to arrive at the suggested description:
1. **Analyze the Image Content:**
* **Dominant Feature:** A large suspension bridge with distinctive red/orange trusswork. This is immediately recognizable as the Golden Gate Bridge.
* **Setting:** Water (a large body of water, likely the bay/ocean).
* **Foreground/Midground:**
* Water in the immediate foreground (dark blue/green).
* A rocky outcrop/island in the middle foreground.
* A substantial stone/brick structure to the left (part of the bridge approach or a related structure).
* **Background:** Hazy landmasses/hills behind the bridge.
* **Sky:** Clear, light blue sky.
* **Atmosphere/Lighting:** Bright daylight, clear weather.
2. **Identify Key Elements for Description:**
* The Bridge (Golden Gate Bridge).
* The Water/Bay.
* The Coastal/Land features.
3. **Draft the Description (Focusing on detail and clarity):**
* *Initial thought:* It's a picture of the Golden Gate Bridge over the water.
* *Refinement (Adding detail):* The image shows the iconic red suspension bridge spanning a body of water. There is a rocky island in the foreground and some structures on the shore.
* *Enhancement (Adding context and visual appeal):* Mention the color, the scale, and the atmosphere.
4. **Final Polish and Structure (Grouping similar ideas):** (This leads to the final structured response.)
* *Identification:* State clearly what the main subject is.
* *Setting:* Describe the environment (water, sky).
* *Details:* Mention specific foreground and background elements.
5. **Review against the original prompt:** (The prompt asks "What is shown in this image?") The description accurately reflects the visual evidence. (Self-Correction: Ensure the identification is confident, which it is, based on the structure and color.)
=== Answer ===
This image shows the **Golden Gate Bridge** spanning a body of water, likely the San Francisco Bay.
Here is a breakdown of what is visible:
* **The Golden Gate Bridge:** The iconic red/orange suspension bridge dominates the frame, stretching across the water. Its distinctive structure and massive towers are clearly visible.
* **Water:** A large expanse of blue-green water fills the foreground and midground.
* **Foreground Elements:** In the immediate foreground, there is a dark, rocky outcrop or small island.
* **Shoreline/Structures:** To the left, there are stone and brick structures, suggesting the land or approach to the bridge.
* **Background:** Hazy hills or landmasses are visible in the distance behind the bridge.
* **Atmosphere:** The scene is brightly lit under a clear, light blue sky, suggesting fair weather.
In summary, it is a scenic photograph capturing the majestic view of the Golden Gate Bridge.
总结与后续步骤
在本指南中,您学习了如何使用 Gemma 4 模型的推理能力在最终答案之前生成推理过程。您学习了以下内容:
- 在
apply_chat_template中使用enable_thinking=True启用思考模式。 - 使用
TextStreamer实时观察思考过程。 - 使用
parse_response将组合输出解析为单独的thinking和answer块。 - 将思考能力应用于多模态任务(图片 + 文本)。
后续步骤
探索 Gemma 4 的更多功能:
在 Google Colab 中运行
在 GitHub 上查看源代码