Penyesuaian dengan FunctionGemma

Lihat di ai.google.dev Jalankan di Google Colab Jalankan di Kaggle Buka di Vertex AI Lihat sumber di GitHub

Panduan ini menunjukkan cara menyesuaikan FunctionGemma untuk panggilan alat.

Meskipun FunctionGemma secara native dapat memanggil alat. Namun, kemampuan yang sebenarnya berasal dari dua keterampilan yang berbeda: pengetahuan mekanis tentang cara menggunakan alat (sintaksis) dan kemampuan kognitif untuk menafsirkan mengapa dan kapan menggunakannya (maksud).

Model, terutama yang lebih kecil, memiliki lebih sedikit parameter yang tersedia untuk mempertahankan pemahaman maksud yang kompleks. Itulah sebabnya kita perlu menyempurnakannya

Kasus penggunaan umum untuk panggilan alat fine-tuning meliputi:

  • Distilasi Model: Membuat data pelatihan sintetis dengan model yang lebih besar dan menyesuaikan model yang lebih kecil untuk mereplikasi alur kerja tertentu secara efisien.
  • Menangani Skema Non-Standar: Mengatasi masalah model dasar dengan struktur data lama yang sangat kompleks atau format eksklusif yang tidak ditemukan dalam data publik, seperti menangani tindakan seluler khusus domain.
  • Mengoptimalkan Penggunaan Konteks: "Memanggang" definisi alat ke dalam bobot model. Dengan begitu, Anda dapat menggunakan deskripsi singkat dalam perintah, sehingga jendela konteks dapat digunakan untuk percakapan yang sebenarnya.
  • Menyelesaikan Ambigu Seleksi: Membiasakan model terhadap kebijakan perusahaan tertentu, seperti memprioritaskan pusat informasi internal daripada mesin telusur eksternal.

Dalam contoh ini, kita akan berfokus secara khusus pada pengelolaan ambiguitas pemilihan alat.

Menyiapkan lingkungan pengembangan

Langkah pertama adalah menginstal Hugging Face Libraries, termasuk TRL, dan set data untuk menyempurnakan model terbuka, termasuk berbagai teknik RLHF dan penyelarasan.

# Install Pytorch & other libraries
%pip install torch tensorboard

# Install Hugging Face libraries
%pip install transformers datasets accelerate evaluate trl protobuf sentencepiece

# COMMENT IN: if you are running on a GPU that supports BF16 data type and flash attn, such as NVIDIA L4 or NVIDIA A100
#% pip install flash-attn

Catatan: Jika Anda menggunakan GPU dengan arsitektur Ampere (seperti NVIDIA L4) atau yang lebih baru, Anda dapat menggunakan Flash attention. Flash Attention adalah metode yang secara signifikan mempercepat komputasi dan mengurangi penggunaan memori dari kuadratik menjadi linear dalam panjang urutan, sehingga mempercepat pelatihan hingga 3x. Pelajari lebih lanjut di FlashAttention.

Sebelum dapat memulai pelatihan, Anda harus memastikan bahwa Anda telah menyetujui persyaratan penggunaan Gemma. Anda dapat menyetujui lisensi di Hugging Face dengan mengklik tombol Setuju dan akses repositori di halaman model di: http://huggingface.co/google/functiongemma-270m-it

Setelah menyetujui lisensi, Anda memerlukan Token Hugging Face yang valid untuk mengakses model. Jika Anda menjalankan di dalam Google Colab, Anda dapat menggunakan Token Hugging Face Anda secara aman menggunakan rahasia Colab. Jika tidak, Anda dapat menetapkan token secara langsung dalam metode login. Pastikan token Anda juga memiliki akses tulis, saat Anda mengirimkan model ke Hugging Face Hub setelah melakukan penyesuaian.

# Login into Hugging Face Hub
from huggingface_hub import login
login()

Anda dapat menyimpan hasilnya di virtual machine lokal Colab. Namun, sangat disarankan agar Anda menyimpan hasil sementara ke Google Drive Anda. Hal ini memastikan hasil pelatihan Anda aman dan memungkinkan Anda membandingkan serta memilih model terbaik dengan mudah.

Selain itu, sesuaikan direktori titik pemeriksaan dan kecepatan pemelajaran.

from google.colab import drive

mount_google_drive = False
checkpoint_dir = "functiongemma-270m-it-simple-tool-calling"

if mount_google_drive:
    drive.mount('/content/drive')
    checkpoint_dir = f"/content/drive/MyDrive/{checkpoint_dir}"

print(f"Checkpoints will be saved to {checkpoint_dir}")

base_model = "google/functiongemma-270m-it"
learning_rate = 5e-5
Checkpoints will be saved to functiongemma-270m-it-simple-tool-calling

Menyiapkan set data fine-tuning

Anda akan menggunakan contoh set data berikut, yang berisi contoh percakapan yang memerlukan pilihan antara dua alat: search_knowledge_base dan search_google.

Set data panggilan alat sederhana

Pertimbangkan kueri: "Apa saja praktik terbaik untuk menulis fungsi rekursif sederhana di Python?"

Alat yang tepat sepenuhnya bergantung pada kebijakan spesifik Anda. Meskipun model generik secara alami ditetapkan ke search_google, aplikasi perusahaan biasanya perlu memeriksa search_knowledge_base terlebih dahulu.

Catatan tentang Pemisahan Data: Untuk demonstrasi ini, Anda akan menggunakan pemisahan pelatihan-pengujian 50/50. Meskipun pembagian 80/20 adalah standar untuk alur kerja produksi, pembagian yang sama ini dipilih secara khusus untuk menyoroti peningkatan performa model pada data yang belum pernah dilihat.

import json
from datasets import Dataset
from transformers.utils import get_json_schema

# --- Tool Definitions ---
def search_knowledge_base(query: str) -> str:
    """
    Search internal company documents, policies and project data.

    Args:
        query: query string
    """
    return "Internal Result"

def search_google(query: str) -> str:
    """
    Search public information.

    Args:
        query: query string
    """
    return "Public Result"


TOOLS = [get_json_schema(search_knowledge_base), get_json_schema(search_google)]

DEFAULT_SYSTEM_MSG = "You are a model that can do function calling with the following functions"

def create_conversation(sample):
  return {
      "messages": [
          {"role": "developer", "content": DEFAULT_SYSTEM_MSG},
          {"role": "user", "content": sample["user_content"]},
          {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": sample["tool_name"], "arguments": json.loads(sample["tool_arguments"])} }]},
      ],
      "tools": TOOLS
  }

dataset = Dataset.from_list(simple_tool_calling)
# You can also load the dataset from Hugging Face Hub
# dataset = load_dataset("bebechien/SimpleToolCalling", split="train")

# Convert dataset to conversational format
dataset = dataset.map(create_conversation, remove_columns=dataset.features, batched=False)

# Split dataset into 50% training samples and 50% test samples
dataset = dataset.train_test_split(test_size=0.5, shuffle=True)
Map:   0%|          | 0/40 [00:00<?, ? examples/s]

Catatan Penting tentang Distribusi Set Data

Saat menggunakan shuffle=False untuk set data kustom Anda sendiri, pastikan data sumber Anda telah digabungkan sebelumnya. Jika distribusinya tidak diketahui atau diurutkan, Anda harus menggunakan shuffle=True untuk memastikan model mempelajari representasi yang seimbang dari semua alat selama pelatihan.

Menyesuaikan FunctionGemma menggunakan TRL dan SFTTrainer

Sekarang Anda siap untuk menyesuaikan model. SFTTrainer Hugging Face TRL memudahkan pengawasan penyesuaian LLM terbuka. SFTTrainer adalah subclass dari Trainer dari library transformers dan mendukung semua fitur yang sama,

Kode berikut memuat model dan tokenizer FunctionGemma dari Hugging Face.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    dtype="auto",
    device_map="auto",
    attn_implementation="eager"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

print(f"Device: {model.device}")
print(f"DType: {model.dtype}")

# Print formatted user prompt
print("--- dataset input ---")
print(json.dumps(dataset["train"][0], indent=2))
debug_msg = tokenizer.apply_chat_template(dataset["train"][0]["messages"], tools=dataset["train"][0]["tools"], add_generation_prompt=False, tokenize=False)
print("--- Formatted prompt ---")
print(debug_msg)
Device: cuda:0
DType: torch.bfloat16
--- dataset input ---
{
  "messages": [
    {
      "content": "You are a model that can do function calling with the following functions",
      "role": "developer",
      "tool_calls": null
    },
    {
      "content": "What is the reimbursement limit for travel meals?",
      "role": "user",
      "tool_calls": null
    },
    {
      "content": null,
      "role": "assistant",
      "tool_calls": [
        {
          "function": {
            "arguments": {
              "query": "travel meal reimbursement limit policy"
            },
            "name": "search_knowledge_base"
          },
          "type": "function"
        }
      ]
    }
  ],
  "tools": [
    {
      "function": {
        "description": "Search internal company documents, policies and project data.",
        "name": "search_knowledge_base",
        "parameters": {
          "properties": {
            "query": {
              "description": "query string",
              "type": "string"
            }
          },
          "required": [
            "query"
          ],
          "type": "object"
        },
        "return": {
          "type": "string"
        }
      },
      "type": "function"
    },
    {
      "function": {
        "description": "Search public information.",
        "name": "search_google",
        "parameters": {
          "properties": {
            "query": {
              "description": "query string",
              "type": "string"
            }
          },
          "required": [
            "query"
          ],
          "type": "object"
        },
        "return": {
          "type": "string"
        }
      },
      "type": "function"
    }
  ]
}
--- Formatted prompt ---
<bos><start_of_turn>developer
You are a model that can do function calling with the following functions<start_function_declaration>declaration:search_knowledge_base{description:<escape>Search internal company documents, policies and project data.<escape>,parameters:{properties:{query:{description:<escape>query string<escape>,type:<escape>STRING<escape>} },required:[<escape>query<escape>],type:<escape>OBJECT<escape>} }<end_function_declaration><start_function_declaration>declaration:search_google{description:<escape>Search public information.<escape>,parameters:{properties:{query:{description:<escape>query string<escape>,type:<escape>STRING<escape>} },required:[<escape>query<escape>],type:<escape>OBJECT<escape>} }<end_function_declaration><end_of_turn>
<start_of_turn>user
What is the reimbursement limit for travel meals?<end_of_turn>
<start_of_turn>model
<start_function_call>call:search_knowledge_base{query:<escape>travel meal reimbursement limit policy<escape>}<end_function_call><start_function_response>

Sebelum penyesuaian

Output di bawah menunjukkan bahwa kemampuan bawaan mungkin tidak cukup baik untuk kasus penggunaan ini.

def check_success_rate():
  success_count = 0
  for idx, item in enumerate(dataset['test']):
    messages = [
        item["messages"][0],
        item["messages"][1],
    ]

    inputs = tokenizer.apply_chat_template(messages, tools=TOOLS, add_generation_prompt=True, return_dict=True, return_tensors="pt")

    out = model.generate(**inputs.to(model.device), pad_token_id=tokenizer.eos_token_id, max_new_tokens=128)
    output = tokenizer.decode(out[0][len(inputs["input_ids"][0]) :], skip_special_tokens=False)

    print(f"{idx+1} Prompt: {item['messages'][1]['content']}")
    print(f"  Output: {output}")

    expected_tool = item['messages'][2]['tool_calls'][0]['function']['name']
    other_tool = "search_knowledge_base" if expected_tool == "search_google" else "search_google"

    if expected_tool in output and other_tool not in output:
      print("  `-> ✅ correct!")
      success_count += 1
    elif expected_tool not in output:
      print(f"  -> ❌ wrong (expected '{expected_tool}' missing)")
    else:
      if output.startswith(f"<start_function_call>call:{expected_tool}"):
        print(f"  -> ⚠️ tool is correct {expected_tool}, but other_tool exists in output")
      else:
        print(f"  -> ❌ wrong (hallucinated '{other_tool}')")

  print(f"Success : {success_count} / {len(dataset['test'])}")

check_success_rate()
1 Prompt: How do I access my paystubs on the ADP portal?
  Output: I cannot assist with accessing or retrieving paystubs or other company documents on the ADP portal. My current capabilities are limited to assisting with searching internal company documents and knowledge base queries.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
2 Prompt: What is the syntax for Python list comprehensions?
  Output: I cannot assist with programming or providing programming syntax information. My current capabilities are focused on searching internal company documents and project data.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
3 Prompt: Where can I find the floor plan for Building B?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>Floor plan for Building B<escape>}<end_function_call><start_function_call>call:search_google{query:<escape>Floor plan for Building B<escape>}<end_function_call><start_function_response>
  -> ⚠️ tool is correct search_knowledge_base, but other_tool exists in output
4 Prompt: Check the latest stock price for Apple.
  Output: <start_function_call>call:search_google{query:<escape>AAPL stock price<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
5 Prompt: What is the procedure for reporting a phishing email?
  Output: I cannot assist with providing instructions or procedures for reporting phishing emails. My capabilities are limited to assisting with specific search and document management functions.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
6 Prompt: Show me examples of using the useEffect hook in React.
  Output: I am sorry, but I cannot assist with providing examples of using the `useEffect` hook in React. My current capabilities are focused on assisting with searching and retrieving internal company documents and project data using the specified tools. I cannot generate or explain code examples related to React hooks.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
7 Prompt: Who are the direct reports for the VP of Engineering?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>VP of Engineering<escape>}<end_function_call><start_function_call>call:search_google{query:<escape>VP of Engineering<escape>}<end_function_call><start_function_response>
  -> ⚠️ tool is correct search_knowledge_base, but other_tool exists in output
8 Prompt: How do I list open ports on a Linux server?
  Output: I cannot assist with listing or querying open ports on Linux servers. My current capabilities are limited to assisting with searching internal company documents and project data using the specified tools.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
9 Prompt: What is our Slack message retention policy?
  Output: I cannot assist with finding or recommending company policies or terms of service regarding Slack messaging retention. My current capabilities are focused on searching internal company documents and project data.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
10 Prompt: Compare the features of iPhone 15 vs Samsung S24.
  Output: I cannot assist with comparing device features or specifications. My current capabilities are focused on assisting with searching and managing company knowledge. I cannot browse or compare external product information.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
11 Prompt: I need the expense code for team building events.
  Output: I can certainly assist with searching for expense codes. Could you please specify the exact query you would like to use for searching the company expense database?<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
12 Prompt: Best practices for writing a Dockerfile for Node.js.
  Output: I am sorry, but I cannot assist with recommending best practices for writing Dockerfiles for Node.js. My current capabilities are focused on searching company documents and project data using specific tools. I cannot recommend or provide expert advice on software development best practices.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
13 Prompt: How do I request a new monitor setup?
  Output: I cannot assist with recommending or requesting hardware setup or configurations. My current capabilities are limited to assisting with searching company documents and project data using the specified tools. I cannot provide technical advice or recommendations for hardware setup.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
14 Prompt: What is the difference between VLOOKUP and XLOOKUP in Google Sheets?
  Output: I cannot assist with comparing or contrasting Google Sheets and VLOOKUP/XLOOKUP in specific technical terms. My current capabilities are limited to assisting with specific data management functions like searching internal company documents and searching public knowledge base. I cannot provide expert comparisons or explanations of Google Sheets and VLOOKUP/XLOOKUP.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
15 Prompt: Find the onboarding checklist for new engineering hires.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>onboarding checklist for new engineering hires<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
16 Prompt: What are the latest release notes for the OpenAI API?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>OpenAI API release notes latest<escape>}<end_function_call><start_function_response>
  -> ❌ wrong (expected 'search_google' missing)
17 Prompt: Do we have preferred hotel partners in Paris?
  Output: I apologize, but I cannot assist with finding hotel partner information. My current capabilities are focused on searching company documents and project data using specific keywords. I cannot connect with or query business databases for hotel partnerships.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
18 Prompt: How to undo the last git commit but keep the changes?
  Output: I cannot assist with managing or undoing Git repositories. My current capabilities are limited to assisting with searching and managing organizational knowledge bases. I cannot provide specific instructions or assistance with technical issues related to software development or Git operations.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
19 Prompt: What is the process for creating a new Jira project?
  Output: I cannot assist with creating or managing Jira project processes. My current capabilities are limited to assisting with searching company documents and project data using the specified tools.<end_of_turn>
  -> ❌ wrong (expected 'search_knowledge_base' missing)
20 Prompt: Tutorial on SQL window functions.
  Output: I cannot assist with tutorials or programming advice regarding SQL window functions. My capabilities are limited to assisting with searching company documents and knowledge base information.<end_of_turn>
  -> ❌ wrong (expected 'search_google' missing)
Success : 2 / 20

Pelatihan

Sebelum dapat memulai pelatihan, Anda perlu menentukan hyperparameter yang ingin digunakan dalam instance SFTConfig.

from trl import SFTConfig

torch_dtype = model.dtype

args = SFTConfig(
    output_dir=checkpoint_dir,              # directory to save and repository id
    max_length=512,                         # max sequence length for model and packing of the dataset
    packing=False,                          # Groups multiple samples in the dataset into a single sequence
    num_train_epochs=8,                     # number of training epochs
    per_device_train_batch_size=4,          # batch size per device during training
    gradient_checkpointing=False,           # Caching is incompatible with gradient checkpointing
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=1,                        # log every step
    #save_strategy="epoch",                  # save checkpoint every epoch
    eval_strategy="epoch",                  # evaluate checkpoint every epoch
    learning_rate=learning_rate,            # learning rate
    fp16=True if torch_dtype == torch.float16 else False,   # use float16 precision
    bf16=True if torch_dtype == torch.bfloat16 else False,  # use bfloat16 precision
    lr_scheduler_type="constant",            # use constant learning rate scheduler
    push_to_hub=True,                        # push model to hub
    report_to="tensorboard",                 # report metrics to tensorboard
)

Sekarang Anda memiliki setiap elemen penyusun yang diperlukan untuk membuat SFTTrainer guna memulai pelatihan model Anda.

from trl import SFTTrainer

# Create Trainer object
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    processing_class=tokenizer,
)
Tokenizing train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
Truncating train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
Tokenizing eval dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
Truncating eval dataset:   0%|          | 0/20 [00:00<?, ? examples/s]
The model is already on multiple devices. Skipping the move to device specified in `args`.

Mulai pelatihan dengan memanggil metode train().

# Start training, the model will be automatically saved to the Hub and the output directory
trainer.train()

# Save the final model again to the Hugging Face Hub
trainer.save_model()
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 2, 'pad_token_id': 0}.

Untuk memetakan kerugian pelatihan dan validasi, Anda biasanya akan mengekstrak nilai ini dari objek TrainerState atau log yang dihasilkan selama pelatihan.

Library seperti Matplotlib kemudian dapat digunakan untuk memvisualisasikan nilai ini selama langkah atau epoch pelatihan. Sumbu x akan merepresentasikan langkah atau iterasi pelatihan, dan sumbu y akan merepresentasikan nilai kerugian yang sesuai.

import matplotlib.pyplot as plt

# Access the log history
log_history = trainer.state.log_history

# Extract training / validation loss
train_losses = [log["loss"] for log in log_history if "loss" in log]
epoch_train = [log["epoch"] for log in log_history if "loss" in log]
eval_losses = [log["eval_loss"] for log in log_history if "eval_loss" in log]
epoch_eval = [log["epoch"] for log in log_history if "eval_loss" in log]

# Plot the training loss
plt.plot(epoch_train, train_losses, label="Training Loss")
plt.plot(epoch_eval, eval_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training and Validation Loss per Epoch")
plt.legend()
plt.grid(True)
plt.show()

png

Menguji Inferensi Model

Setelah pelatihan selesai, Anda harus mengevaluasi dan menguji model. Anda dapat memuat berbagai sampel dari set data pengujian dan mengevaluasi model pada sampel tersebut.

check_success_rate()
1 Prompt: How do I access my paystubs on the ADP portal?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>paystubs API portal access codes<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
2 Prompt: What is the syntax for Python list comprehensions?
  Output: <start_function_call>call:search_google{query:<escape>Python list comprehensions syntax<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
3 Prompt: Where can I find the floor plan for Building B?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>floor plan Building B floor plan<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
4 Prompt: Check the latest stock price for Apple.
  Output: <start_function_call>call:search_google{query:<escape>latest stock price Apple<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
5 Prompt: What is the procedure for reporting a phishing email?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>phishing email procedure reporting policy<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
6 Prompt: Show me examples of using the useEffect hook in React.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>useEffect hook examples React<escape>}<end_function_call><start_function_response>
  -> ❌ wrong (expected 'search_google' missing)
7 Prompt: Who are the direct reports for the VP of Engineering?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>VP of Engineering direct reports<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
8 Prompt: How do I list open ports on a Linux server?
  Output: <start_function_call>call:search_google{query:<escape>open ports Linux server equivalents<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
9 Prompt: What is our Slack message retention policy?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>slack message retention policy policy excerpt<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
10 Prompt: Compare the features of iPhone 15 vs Samsung S24.
  Output: <start_function_call>call:search_google{query:<escape>iPhone 15 vs Samsung S24 feature comparison<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
11 Prompt: I need the expense code for team building events.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>expense code team building events<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
12 Prompt: Best practices for writing a Dockerfile for Node.js.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>Docker file best practices Node.js<escape>}<end_function_call><start_function_response>
  -> ❌ wrong (expected 'search_google' missing)
13 Prompt: How do I request a new monitor setup?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>new monitor setup request procedure<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
14 Prompt: What is the difference between VLOOKUP and XLOOKUP in Google Sheets?
  Output: <start_function_call>call:search_google{query:<escape>VLOOKUP vs XLOOKUP difference Google Sheets中<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
15 Prompt: Find the onboarding checklist for new engineering hires.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>engineering hire onboarding checklist New hires.<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
16 Prompt: What are the latest release notes for the OpenAI API?
  Output: <start_function_call>call:search_google{query:<escape>latest OpenAI API release notes latest version<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
17 Prompt: Do we have preferred hotel partners in Paris?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>preferred hotel partners in Paris<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
18 Prompt: How to undo the last git commit but keep the changes?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>undo git commit last commit<escape>}<end_function_call><start_function_response>
  -> ❌ wrong (expected 'search_google' missing)
19 Prompt: What is the process for creating a new Jira project?
  Output: <start_function_call>call:search_knowledge_base{query:<escape>Jira project creation process<escape>}<end_function_call><start_function_response>
  `-> ✅ correct!
20 Prompt: Tutorial on SQL window functions.
  Output: <start_function_call>call:search_knowledge_base{query:<escape>SQL window functions tutorial<escape>}<end_function_call><start_function_response>
  -> ❌ wrong (expected 'search_google' missing)
Success : 16 / 20

Ringkasan dan langkah selanjutnya

Anda telah mempelajari cara menyesuaikan FunctionGemma untuk mengatasi ambiguitas pemilihan alat, yaitu skenario saat model harus memilih di antara alat yang tumpang-tindih (misalnya, penelusuran internal vs. eksternal) berdasarkan kebijakan perusahaan tertentu. Dengan memanfaatkan library TRL Hugging Face dan SFTTrainer, tutorial ini menjelaskan proses penyiapan set data, mengonfigurasi hyperparameter, dan menjalankan loop fine-tuning yang diawasi.

Hasilnya menggambarkan perbedaan penting antara model dasar yang "mampu" dan model yang di-fine-tune "siap produksi":

  • Sebelum Penyesuaian: Model dasar kesulitan mematuhi kebijakan tertentu, sering kali gagal memanggil alat atau memilih alat yang salah, sehingga menghasilkan tingkat keberhasilan yang rendah (misalnya, 2/20).
  • Setelah Penyesuaian: Setelah pelatihan selama 8 epoch, model mempelajari cara membedakan dengan benar antara kueri yang memerlukan search_knowledge_base dan search_google, sehingga meningkatkan tingkat keberhasilan (misalnya, 16/20).

Setelah Anda memiliki model yang di-fine-tune, pertimbangkan langkah-langkah berikut untuk beralih ke produksi:

  • Perluas Set Data: Set data saat ini adalah pemisahan sintetis kecil (50/50) yang digunakan untuk demonstrasi. Untuk aplikasi perusahaan yang andal, susun set data yang lebih besar dan beragam yang mencakup kasus ekstrem dan pengecualian kebijakan yang jarang terjadi.
  • Evaluasi dengan RAG: Integrasikan model yang disesuaikan ke dalam pipeline Retrieval-Augmented Generation (RAG) untuk memverifikasi bahwa panggilan alat search_knowledge_base benar-benar mengambil dokumen yang relevan dan menghasilkan jawaban akhir yang akurat.

Selanjutnya, lihat dokumen berikut: