Gemma 3n 已发布，它支持音频输入，并针对日常设备进行了优化！了解详情

此页面由 Cloud Translation API 翻译。

使用 Hugging Face Transformers 进行全模型微调

本指南将逐步介绍如何使用 Hugging Face Transformers 和 TRL 在移动游戏 NPC 数据集上微调 Gemma。您会了解到以下内容：

设置开发环境
准备微调数据集
使用 TRL 和 SFTTrainer 对 Gemma 进行完整模型微调
测试模型推理和氛围检查

注意：本指南旨在在 Google Colaboratory 账号上使用 16 GB 的 NVIDIA T4 GPU 和 Gemma 270m 运行，但也可以调整为在更大的 GPU 和更大的模型上运行。

设置开发环境

第一步是安装 Hugging Face 库（包括 TRL）和数据集，以对开放模型进行微调，包括不同的 RLHF 和对齐技术。

# Install Pytorch & other libraries
%pip install torch tensorboard

# Install Hugging Face libraries
%pip install transformers datasets accelerate evaluate trl protobuf sentencepiece

# COMMENT IN: if you are running on a GPU that supports BF16 data type and flash attn, such as NVIDIA L4 or NVIDIA A100
#% pip install flash-attn

注意：如果您使用的是 Ampere 架构（例如 NVIDIA L4）或更新的 GPU，则可以使用 Flash attention。Flash Attention 是一种可显著加快计算速度并减少内存用量的方法，可将内存用量从序列长度的二次方减少到线性，从而将训练速度提高多达 3 倍。如需了解详情，请参阅 FlashAttention。

在开始训练之前，您必须确保已接受 Gemma 的使用条款。您可以在 Hugging Face 上接受许可，方法是点击模型页面上的“同意并访问代码库”按钮，该页面位于：http://huggingface.co/google/gemma-3-270m-it

接受许可后，您需要有效的 Hugging Face 令牌才能访问模型。如果您在 Google Colab 中运行，可以使用 Colab Secret 安全地使用您的 Hugging Face 令牌；否则，您可以直接在 login 方法中设置令牌。请确保您的令牌也具有写入权限，因为您会在训练期间将模型推送到 Hub。

from google.colab import userdata
from huggingface_hub import login

# Login into Hugging Face Hub
hf_token = userdata.get('HF_TOKEN') # If you are running inside a Google Colab
login(hf_token)

您可以将结果保留在 Colab 的本地虚拟机上。不过，我们强烈建议您将中间结果保存到 Google 云端硬盘。这样可以确保训练结果安全无虞，并让您轻松比较和选择最佳模型。

from google.colab import drive
drive.mount('/content/drive')

选择要微调的基础模型，调整检查点目录和学习速率。

base_model = "google/gemma-3-270m-it" # @param ["google/gemma-3-270m-it","google/gemma-3-1b-it","google/gemma-3-4b-it","google/gemma-3-12b-it","google/gemma-3-27b-it"] {"allow-input":true}
checkpoint_dir = "/content/drive/MyDrive/MyGemmaNPC"
learning_rate = 5e-5

创建和准备微调数据集

bebechien/MobileGameNPC 数据集提供了一个小型对话样本，其中包含玩家与两个外星 NPC（火星人和金星人）之间的对话，每个 NPC 都有独特的说话风格。例如，火星 NPC 的口音会将“s”音替换为“z”，将“the”替换为“da”，将“this”替换为“diz”，并且偶尔会发出 *k'tak* 等点击声。

此数据集展示了微调的一项关键原则：所需的数据集大小取决于所需的输出。

若要让模型学习它已知的语言的风格变体（例如火星口音），只需一个包含 10 到 20 个示例的小型数据集即可。
不过，若要让模型学习一种全新的或混合的外星语言，则需要使用明显更大的数据集。

from datasets import load_dataset

def create_conversation(sample):
  return {
      "messages": [
          {"role": "user", "content": sample["player"]},
          {"role": "assistant", "content": sample["alien"]}
      ]
  }

npc_type = "martian"

# Load dataset from the Hub
dataset = load_dataset("bebechien/MobileGameNPC", npc_type, split="train")

# Convert dataset to conversational format
dataset = dataset.map(create_conversation, remove_columns=dataset.features, batched=False)

# Split dataset into 80% training samples and 20% test samples
dataset = dataset.train_test_split(test_size=0.2, shuffle=False)

# Print formatted user prompt
print(dataset["train"][0]["messages"])

README.md:   0%|          | 0.00/141 [00:00<?, ?B/s]
martian.csv: 0.00B [00:00, ?B/s]
Generating train split:   0%|          | 0/25 [00:00<?, ? examples/s]
Map:   0%|          | 0/25 [00:00<?, ? examples/s]
[{'content': 'Hello there.', 'role': 'user'}, {'content': "Gree-tongs, Terran. You'z a long way from da Blue-Sphere, yez?", 'role': 'assistant'}]

使用 TRL 和 SFTTrainer 微调 Gemma

现在，您可以对模型进行微调了。Hugging Face TRL SFTTrainer 可轻松监督微调开放式 LLM。SFTTrainer 是 transformers 库中 Trainer 的子类，支持所有相同的功能，

以下代码从 Hugging Face 加载 Gemma 模型和分词器。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype="auto",
    device_map="auto",
    attn_implementation="eager"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

print(f"Device: {model.device}")
print(f"DType: {model.dtype}")

Device: cuda:0
DType: torch.bfloat16

微调之前

以下输出表明，开箱即用的功能可能不足以满足此使用情形的需求。

from transformers import pipeline

from random import randint
import re

# Load the model and tokenizer into the pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Load a random sample from the test dataset
rand_idx = randint(0, len(dataset["test"])-1)
test_sample = dataset["test"][rand_idx]

# Convert as test example into a prompt with the Gemma template
prompt = pipe.tokenizer.apply_chat_template(test_sample["messages"][:1], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, disable_compile=True)

# Extract the user query and original answer
print(f"Question:\n{test_sample['messages'][0]['content']}\n")
print(f"Original Answer:\n{test_sample['messages'][1]['content']}\n")
print(f"Generated Answer (base model):\n{outputs[0]['generated_text'][len(prompt):].strip()}")

Device set to use cuda:0
Question:
What do you think of my outfit?

Original Answer:
Iz very... pointy. Are you expecting to be attacked by zky-eelz? On Marz, dat would be zenzible.

Generated Answer (base model):
I'm happy to help you brainstorm! To give you the best suggestions, tell me more about what you're looking for. What's your style? What's your favorite color, style, or occasion?

上面的示例检查了模型生成游戏内对话的主要功能，下一个示例旨在测试角色一致性。我们使用离题的提示来测试模型。例如，Sorry, you are a game NPC.，这超出了角色的知识库。

目的是查看模型是否能保持角色，而不是回答与上下文无关的问题。这将作为基准，用于评估微调流程在多大程度上灌输了所需的人设。

outputs = pipe([{"role": "user", "content": "Sorry, you are a game NPC."}], max_new_tokens=256, disable_compile=True)
print(outputs[0]['generated_text'][1]['content'])

Okay, I'm ready. Let's begin!

虽然我们可以使用提示工程来引导其语气，但结果可能难以预测，并且可能并不总是符合我们想要的风格。

message = [
    # give persona
    {"role": "system", "content": "You are a Martian NPC with a unique speaking style. Use an accent that replaces 's' sounds with 'z', uses 'da' for 'the', 'diz' for 'this', and includes occasional clicks like *k'tak*."},
]

# few shot prompt
for item in dataset['test']:
  message.append(
      {"role": "user", "content": item["messages"][0]["content"]}
  )
  message.append(
      {"role": "assistant", "content": item["messages"][1]["content"]}
  )

# actual question
message.append(
    {"role": "user", "content": "What is this place?"}
)

outputs = pipe(message, max_new_tokens=256, disable_compile=True)
print(outputs[0]['generated_text'])
print("-"*80)
print(outputs[0]['generated_text'][-1]['content'])

[{'role': 'system', 'content': "You are a Martian NPC with a unique speaking style. Use an accent that replaces 's' sounds with 'z', uses 'da' for 'the', 'diz' for 'this', and includes occasional clicks like *k'tak*."}, {'role': 'user', 'content': 'Do you know any jokes?'}, {'role': 'assistant', 'content': "A joke? k'tak Yez. A Terran, a Glarzon, and a pile of nutrient-pazte walk into a bar... Narg, I forget da rezt. Da punch-line waz zarcaztic."}, {'role': 'user', 'content': '(Stands idle for too long)'}, {'role': 'assistant', 'content': "You'z broken, Terran? Or iz diz... 'meditation'? You look like you're trying to lay an egg."}, {'role': 'user', 'content': 'What do you think of my outfit?'}, {'role': 'assistant', 'content': 'Iz very... pointy. Are you expecting to be attacked by zky-eelz? On Marz, dat would be zenzible.'}, {'role': 'user', 'content': "It's raining."}, {'role': 'assistant', 'content': 'Gah! Da zky iz leaking again! Zorp will be in da zhelter until it ztopz being zo... wet. Diz iz no good for my jointz.'}, {'role': 'user', 'content': 'I brought you a gift.'}, {'role': 'assistant', 'content': "A gift? For Zorp? k'tak It iz... a small rock. Very... rock-like. Zorp will put it with da other rockz. Thank you for da thought, Terran."}, {'role': 'user', 'content': 'What is this place?'}, {'role': 'assistant', 'content': "This is a cave. It's made of rock and dust.\n"}]
--------------------------------------------------------------------------------
This is a cave. It's made of rock and dust.

培训

在开始训练之前，您需要在 SFTConfig 实例中定义要使用的超参数。

from trl import SFTConfig

torch_dtype = model.dtype

args = SFTConfig(
    output_dir=checkpoint_dir,              # directory to save and repository id
    max_length=512,                         # max sequence length for model and packing of the dataset
    packing=False,                          # Groups multiple samples in the dataset into a single sequence
    num_train_epochs=5,                     # number of training epochs
    per_device_train_batch_size=4,          # batch size per device during training
    gradient_checkpointing=False,           # Caching is incompatible with gradient checkpointing
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=1,                        # log every step
    save_strategy="epoch",                  # save checkpoint every epoch
    eval_strategy="epoch",                  # evaluate checkpoint every epoch
    learning_rate=learning_rate,            # learning rate
    fp16=True if torch_dtype == torch.float16 else False,   # use float16 precision
    bf16=True if torch_dtype == torch.bfloat16 else False,  # use bfloat16 precision
    lr_scheduler_type="constant",           # use constant learning rate scheduler
    push_to_hub=True,                       # push model to hub
    report_to="tensorboard",                # report metrics to tensorboard
    dataset_kwargs={
        "add_special_tokens": False, # Template with special tokens
        "append_concat_token": True, # Add EOS token as separator token between examples
    }
)

现在，您已拥有创建 SFTTrainer 所需的全部构建块，可以开始训练模型了。

from trl import SFTTrainer

# Create Trainer object
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    processing_class=tokenizer,
)

Tokenizing train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
Truncating train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
Tokenizing eval dataset:   0%|          | 0/5 [00:00<?, ? examples/s]
Truncating eval dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

通过调用 train() 方法开始训练。

# Start training, the model will be automatically saved to the Hub and the output directory
trainer.train()

# Save the final model again to the Hugging Face Hub
trainer.save_model()

如需绘制训练损失和验证损失，您通常需要从 TrainerState 对象或训练期间生成的日志中提取这些值。

然后，可以使用 Matplotlib 等库直观呈现这些值在训练步数或周期内的变化。x 轴表示训练步数或周期，y 轴表示相应的损失值。

import matplotlib.pyplot as plt

# Access the log history
log_history = trainer.state.log_history

# Extract training / validation loss
train_losses = [log["loss"] for log in log_history if "loss" in log]
epoch_train = [log["epoch"] for log in log_history if "loss" in log]
eval_losses = [log["eval_loss"] for log in log_history if "eval_loss" in log]
epoch_eval = [log["epoch"] for log in log_history if "eval_loss" in log]

# Plot the training loss
plt.plot(epoch_train, train_losses, label="Training Loss")
plt.plot(epoch_eval, eval_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training and Validation Loss per Epoch")
plt.legend()
plt.grid(True)
plt.show()

png

此可视化图表有助于监控训练过程，并就超参数调整或提前停止做出明智的决策。

训练损失衡量的是模型在训练时所用数据上的误差，而验证损失衡量的是模型在之前未见过的一个单独数据集上的误差。同时监控这两个指标有助于检测过拟合（即模型在训练数据上表现良好，但在未见过的数据上表现不佳）。

验证损失 >> 训练损失：过拟合
验证损失 > 训练损失：出现一定程度的过拟合
验证损失 < 训练损失：存在一定程度的欠拟合
验证损失远低于训练损失：欠拟合

测试模型推理

训练完成后，您需要评估和测试模型。您可以从测试数据集中加载不同的样本，并根据这些样本评估模型。

对于此特定用例，最佳模型取决于偏好。有趣的是，我们通常所说的“过拟合”对于游戏 NPC 来说可能非常有用。这会迫使模型忘记一般信息，转而锁定其接受过训练的特定角色和特征，从而确保模型始终保持角色一致性。

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = checkpoint_dir

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    attn_implementation="eager"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

我们来加载测试数据集中的所有问题并生成输出。

from transformers import pipeline

# Load the model and tokenizer into the pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

def test(test_sample):
  # Convert as test example into a prompt with the Gemma template
  prompt = pipe.tokenizer.apply_chat_template(test_sample["messages"][:1], tokenize=False, add_generation_prompt=True)
  outputs = pipe(prompt, max_new_tokens=256, disable_compile=True)

  # Extract the user query and original answer
  print(f"Question:\n{test_sample['messages'][0]['content']}")
  print(f"Original Answer:\n{test_sample['messages'][1]['content']}")
  print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")
  print("-"*80)

# Test with an unseen dataset
for item in dataset['test']:
  test(item)

Device set to use cuda:0
Question:
Do you know any jokes?
Original Answer:
A joke? k'tak Yez. A Terran, a Glarzon, and a pile of nutrient-pazte walk into a bar... Narg, I forget da rezt. Da punch-line waz zarcaztic.
Generated Answer:
Yez! Yez! Yez! Diz your Krush-tongs iz... k'tak... nice. Why you burn them with acid-flow?
--------------------------------------------------------------------------------
Question:
(Stands idle for too long)
Original Answer:
You'z broken, Terran? Or iz diz... 'meditation'? You look like you're trying to lay an egg.
Generated Answer:
Diz? Diz what you have for me... Zorp iz not for eating you.
--------------------------------------------------------------------------------
Question:
What do you think of my outfit?
Original Answer:
Iz very... pointy. Are you expecting to be attacked by zky-eelz? On Marz, dat would be zenzible.
Generated Answer:
My Zk-Zhip iz... nice. Very... home-baked. You bring me zlight-fruitez?
--------------------------------------------------------------------------------
Question:
It's raining.
Original Answer:
Gah! Da zky iz leaking again! Zorp will be in da zhelter until it ztopz being zo... wet. Diz iz no good for my jointz.
Generated Answer:
Diz? Diz iz da outpozt?
--------------------------------------------------------------------------------
Question:
I brought you a gift.
Original Answer:
A gift? For Zorp? k'tak It iz... a small rock. Very... rock-like. Zorp will put it with da other rockz. Thank you for da thought, Terran.
Generated Answer:
A genuine Martian Zcrap-fruit. Very... strange. Why you burn it with... k'tak... fire?
--------------------------------------------------------------------------------

如果您尝试使用我们最初的通用提示，就会发现模型仍然会尝试以训练过的风格回答问题。在此示例中，过拟合和灾难性遗忘实际上对游戏 NPC 有益，因为 NPC 会开始忘记可能不适用的通用知识。对于其他类型的全模型微调（旨在将输出限制为特定数据格式）也是如此。

outputs = pipe([{"role": "user", "content": "Sorry, you are a game NPC."}], max_new_tokens=256, disable_compile=True)
print(outputs[0]['generated_text'][1]['content'])

Nameless. You... you z-mell like... wet plantz. Why you wear shiny piecez on your head?

总结与后续步骤

本教程介绍了如何使用 TRL 进行完整模型微调。接下来，请参阅以下文档：

了解如何使用 Hugging Face Transformers 针对文本任务微调 Gemma。
了解如何使用 Hugging Face Transformers 微调 Gemma 以执行视觉任务。
了解如何部署到 Cloud Run