Gemma 4 ra mắt với đầu vào văn bản, âm thanh và hình ảnh, đồng thời có cửa sổ ngữ cảnh dài lên đến 256 nghìn token! Tìm hiểu thêm

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Gọi hàm bằng Gemma 4

Xem trên ai.google.dev

Chạy trong Google Colab

Chạy trong Kaggle

Mở trong Vertex AI

Xem nguồn trên GitHub

Khi sử dụng một mô hình trí tuệ nhân tạo tạo sinh (AI) như Gemma, bạn có thể muốn sử dụng mô hình này để vận hành các giao diện lập trình nhằm hoàn thành các tác vụ hoặc trả lời câu hỏi. Hướng dẫn một mô hình bằng cách xác định giao diện lập trình rồi đưa ra yêu cầu sử dụng giao diện đó được gọi là gọi hàm.

Lưu ý quan trọng: Mô hình Gemma không thể tự thực thi mã. Khi tạo mã bằng tính năng gọi hàm, bạn phải tự chạy mã đã tạo hoặc chạy mã đó trong ứng dụng của mình. Luôn áp dụng các biện pháp bảo vệ để xác thực mọi mã được tạo trước khi thực thi.

Hướng dẫn này trình bày quy trình sử dụng Gemma 4 trong hệ sinh thái Hugging Face.

Sổ tay này sẽ chạy trên GPU T4.

Cài đặt các gói Python

Cài đặt các thư viện Hugging Face cần thiết để chạy mô hình Gemma và đưa ra yêu cầu.

# Install PyTorch & other libraries
pip install torch accelerate

# Install the transformers library
pip install transformers

Tải mô hình

Sử dụng các thư viện transformers để tạo một phiên bản của processor và model bằng cách sử dụng các lớp AutoProcessor và AutoModelForImageTextToText như trong ví dụ mã sau:

MODEL_ID = "google/gemma-4-E2B-it" # @param ["google/gemma-4-E2B-it","google/gemma-4-E4B-it", "google/gemma-4-31B-it", "google/gemma-4-26B-A4B-it"]

from transformers import AutoProcessor, AutoModelForMultimodalLM

model = AutoModelForMultimodalLM.from_pretrained(MODEL_ID, dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)

Loading weights:   0%|          | 0/2011 [00:00<?, ?it/s]

Công cụ chuyển

Bạn có thể truyền các công cụ đến mô hình bằng hàm apply_chat_template() thông qua đối số tools. Có hai phương pháp để xác định các công cụ này:

Giản đồ JSON: Bạn có thể tự tạo một từ điển JSON xác định tên hàm, nội dung mô tả và các tham số (bao gồm cả các loại và trường bắt buộc).
Hàm Python thô: Bạn có thể truyền các hàm Python thực tế. Hệ thống sẽ tự động tạo giản đồ JSON cần thiết bằng cách phân tích cú pháp các gợi ý về loại, đối số và chuỗi tài liệu của hàm. Để có kết quả tốt nhất, chuỗi tài liệu phải tuân thủ Hướng dẫn về phong cách Python của Google.

Dưới đây là ví dụ về giản đồ JSON.

from transformers import TextStreamer

weather_function_schema = {
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city name, e.g. San Francisco",
                },
            },
            "required": ["location"],
        },
    }
}

message = [
    {
        "role": "system", "content": "You are a helpful assistant."
    },
    {
        "role": "user", "content": "What's the temperature in London?"
    }
]

text = processor.apply_chat_template(message, tools=[weather_function_schema], tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
streamer = TextStreamer(processor)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=64)

<bos><|turn>system
You are a helpful assistant.<|tool>declaration:get_current_temperature{description:<|"|>Gets the current temperature for a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city name, e.g. San Francisco<|"|>,type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
<|turn>user
What's the temperature in London?<turn|>
<|turn>model
<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|><|tool_response>

Và đây là ví dụ tương tự với hàm Python thô.

from transformers.utils import get_json_schema

def get_current_temperature(location: str):
    """
    Gets the current temperature for a given location.

    Args:
        location: The city name, e.g. San Francisco
    """
    return "15°C"

message = [
    {
        "role": "user", "content": "What's the temperature in London?"
    }
]

text = processor.apply_chat_template(message, tools=[get_json_schema(get_current_temperature)], tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
streamer = TextStreamer(processor)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=256)

<bos><|turn>system
<|tool>declaration:get_current_temperature{description:<|"|>Gets the current temperature for a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city name, e.g. San Francisco<|"|>,type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
<|turn>user
What's the temperature in London?<turn|>
<|turn>model
<|tool_call>call:get_current_temperature{location:<|"|>London<|"|>}<tool_call|><|tool_response>

Trình tự gọi hàm đầy đủ

Phần này minh hoạ một chu kỳ gồm 3 giai đoạn để kết nối mô hình với các công cụ bên ngoài: Lượt của mô hình để tạo các đối tượng lệnh gọi hàm, Lượt của nhà phát triển để phân tích cú pháp và thực thi mã (chẳng hạn như API thời tiết) và Phản hồi cuối cùng khi mô hình sử dụng đầu ra của công cụ để trả lời người dùng.

Đến lượt của mô hình

Sau đây là lời nhắc cho người dùng "Hey, what's the weather in Tokyo right now?" và công cụ [get_current_weather]. Gemma tạo một đối tượng lệnh gọi hàm như sau.

# Define a function that our model can use.
def get_current_weather(location: str, unit: str = "celsius"):
    """
    Gets the current weather in a given location.

    Args:
        location: The city and state, e.g. "San Francisco, CA" or "Tokyo, JP"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])

    Returns:
        temperature: The current temperature in the given location
        weather: The current weather in the given location
    """
    return {"temperature": 15, "weather": "sunny"}

prompt = "Hey, what's the weather in Tokyo right now?"
tools = [get_current_weather]

message = [
    {
        "role": "system", "content": "You are a helpful assistant."
    },
    {
        "role": "user", "content": prompt
    },
]

text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128)
generated_tokens = out[0][len(inputs["input_ids"][0]):]
output = processor.decode(generated_tokens, skip_special_tokens=False)

print(f"Prompt: {prompt}")
print(f"Tools: {tools}")
print(f"Output: {output}")

Prompt: Hey, what's the weather in Tokyo right now?
Tools: [<function get_current_weather at 0x7cef824ece00>]
Output: <|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|><|tool_response>

Đến lượt nhà phát triển

Ứng dụng của bạn phải phân tích cú pháp phản hồi của mô hình để trích xuất tên hàm và đối số, đồng thời nối tool_calls và tool_responses với vai trò assistant.

Lưu ý: Luôn xác thực tên hàm và đối số trước khi thực thi.

import re
import json

def extract_tool_calls(text):
    def cast(v):
        try: return int(v)
        except:
            try: return float(v)
            except: return {'true': True, 'false': False}.get(v.lower(), v.strip("'\""))

    return [{
        "name": name,
        "arguments": {
            k: cast((v1 or v2).strip())
            for k, v1, v2 in re.findall(r'(\w+):(?:<\|"\|>(.*?)<\|"\|>|([^,}]*))', args)
        }
    } for name, args in re.findall(r"<\|tool_call>call:(\w+)\{(.*?)\}<tool_call\|>", text, re.DOTALL)]

calls = extract_tool_calls(output)
if calls:
    # Call the function and get the result
    #####################################
    # WARNING: This is a demonstration. #
    #####################################
    # Using globals() to call functions dynamically can be dangerous in
    # production. In a real application, you should implement a secure way to
    # map function names to actual function calls, such as a predefined
    # dictionary of allowed tools and their implementations.
    results = [
        {"name": c['name'], "response": globals()[c['name']](**c['arguments'])}
        for c in calls
    ]

    message.append({
        "role": "assistant",
        "tool_calls": [
            {"function": call} for call in calls
        ],
        "tool_responses": results
    })
    print(json.dumps(message[-1], indent=2))

{
  "role": "assistant",
  "tool_calls": [
    {
      "function": {
        "name": "get_current_weather",
        "arguments": {
          "location": "Tokyo, JP"
        }
      }
    }
  ],
  "tool_responses": [
    {
      "name": "get_current_weather",
      "response": {
        "temperature": 15,
        "weather": "sunny"
      }
    }
  ]
}

Lưu ý: Để có kết quả tối ưu, hãy thêm kết quả thực thi công cụ vào nhật ký tin nhắn theo định dạng cụ thể bên dưới. Điều này đảm bảo mẫu trò chuyện tạo đúng cấu trúc mã thông báo cần thiết (ví dụ: response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}).

"tool_responses": [
  {
    "name": function_name,
    "response": function_response
  }
]

Trong trường hợp có nhiều yêu cầu độc lập:

"tool_responses": [
  {
    "name": function_name_1,
    "response": function_response_1
  },
  {
    "name": function_name_2,
    "response": function_response_2
  }
]

Câu trả lời cuối cùng

Cuối cùng, Gemma đọc phản hồi của công cụ và trả lời người dùng.

text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128)
generated_tokens = out[0][len(inputs["input_ids"][0]):]
output = processor.decode(generated_tokens, skip_special_tokens=True)
print(f"Output: {output}")
message[-1]["content"] = output

Output: The current weather in Tokyo is 15 degrees and sunny.

Bạn có thể xem toàn bộ nhật ký trò chuyện bên dưới.

# full history
print(json.dumps(message, indent=2))

print("-"*80)
output = processor.decode(out[0], skip_special_tokens=False)
print(f"Output: {output}")

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Hey, what's the weather in Tokyo right now?"
  },
  {
    "role": "assistant",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "location": "Tokyo, JP"
          }
        }
      }
    ],
    "tool_responses": [
      {
        "name": "get_current_weather",
        "response": {
          "temperature": 15,
          "weather": "sunny"
        }
      }
    ],
    "content": "The current weather in Tokyo is 15 degrees and sunny."
  }
]
--------------------------------------------------------------------------------
Output: <bos><|turn>system
You are a helpful assistant.<|tool>declaration:get_current_weather{description:<|"|>Gets the current weather in a given location.<|"|>,parameters:{properties:{location:{description:<|"|>The city and state, e.g. "San Francisco, CA" or "Tokyo, JP"<|"|>,type:<|"|>STRING<|"|>},unit:{description:<|"|>The unit to return the temperature in.<|"|>,enum:[<|"|>celsius<|"|>,<|"|>fahrenheit<|"|>],type:<|"|>STRING<|"|>} },required:[<|"|>location<|"|>],type:<|"|>OBJECT<|"|>} }<tool|><turn|>
<|turn>user
Hey, what's the weather in Tokyo right now?<turn|>
<|turn>model
<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|><|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>The current weather in Tokyo is 15 degrees and sunny.<turn|>

Gọi hàm bằng mô hình Tư duy

Bằng cách sử dụng quy trình suy luận nội bộ, mô hình này giúp nâng cao đáng kể độ chính xác khi gọi hàm. Điều này giúp bạn đưa ra quyết định chính xác hơn về thời điểm kích hoạt một công cụ và cách xác định các tham số của công cụ đó.

prompt = "Hey, I'm in Seoul. Is it good for running now?"
message = [
    {
        "role": "system", "content": "You are a helpful assistant."
    },
    {
        "role": "user", "content": prompt
    },
]

text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

out = model.generate(**inputs, max_new_tokens=1024)
output = processor.decode(out[0][input_len:], skip_special_tokens=False)
result = processor.parse_response(output)

for key, value in result.items():
  if key == "role":
    print(f"Role: {value}")
  elif key == "thinking":
    print(f"\n=== Thoughts ===\n{value}")
  elif key == "content":
    print(f"\n=== Answer ===\n{value}")
  elif key == "tool_calls":
    print(f"\n=== Tool Calls ===\n{value}")
  else:
    print(f"\n{key}: {value}...\n")

Role: assistant

=== Thoughts ===

1.  **Analyze the Request:** The user is asking if it's "good for running now" in "Seoul".

2.  **Identify Necessary Information:** To determine if it's good for running, I need current weather information (temperature, precipitation, etc.) for Seoul.

3.  **Examine Available Tools:** The available tool is `get_current_weather(location, unit)`.

4.  **Determine Tool Arguments:**
    *   `location`: The user specified "Seoul".
    *   `unit`: The user did not specify a unit (Celsius or Fahrenheit).

5.  **Formulate the Tool Call:** I need to call `get_current_weather` with the location. Since the user didn't specify a unit, I can either omit it (if the tool defaults are acceptable) or choose a common one. However, the tool definition requires `location` but `unit` is optional.

6.  **Construct the Response Strategy:**
    *   Call the tool to get the weather data for Seoul.
    *   Once the data is received, I can advise the user on whether it's suitable for running.

7.  **Generate Tool Call:**

    ```json
    {
      "toolSpec": {
        "name": "get_current_weather",
        "args": {
          "location": "Seoul"
        }
      }
    }
    ```
    (Self-correction: The `unit` parameter is optional in the definition, so just providing the location is sufficient to proceed.)

8.  **Final Output Generation:** Present the tool call to the user/system.

=== Tool Calls ===
[{'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': {'location': 'Seoul'} } }]

Xử lý lệnh gọi công cụ và nhận câu trả lời cuối cùng.

calls = extract_tool_calls(output)
if calls:
    # Call the function and get the result
    #####################################
    # WARNING: This is a demonstration. #
    #####################################
    # Using globals() to call functions dynamically can be dangerous in
    # production. In a real application, you should implement a secure way to
    # map function names to actual function calls, such as a predefined
    # dictionary of allowed tools and their implementations.
    results = [
        {"name": c['name'], "response": globals()[c['name']](**c['arguments'])}
        for c in calls
    ]

    message.append({
        "role": "assistant",
        "tool_calls": [
            {"function": call} for call in calls
        ],
        "tool_responses": results
    })

text = processor.apply_chat_template(message, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128)
generated_tokens = out[0][len(inputs["input_ids"][0]):]
output = processor.decode(generated_tokens, skip_special_tokens=True)
print(f"Output: {output}")
message[-1]["content"] = output

print("-"*80)
print("Full History")
print("-"*80)
print(json.dumps(message, indent=2))

Output: The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!
--------------------------------------------------------------------------------
Full History
--------------------------------------------------------------------------------
[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Hey, I'm in Seoul. Is it good for running now?"
  },
  {
    "role": "assistant",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "location": "Seoul"
          }
        }
      }
    ],
    "tool_responses": [
      {
        "name": "get_current_weather",
        "response": {
          "temperature": 15,
          "weather": "sunny"
        }
      }
    ],
    "content": "The current weather in Seoul is 15 degrees Celsius and sunny. That sounds like great weather for a run!"
  }
]

Lưu ý quan trọng: Lược đồ tự động so với lược đồ thủ công

Khi dựa vào quá trình chuyển đổi tự động từ các hàm Python sang giản đồ JSON, đầu ra được tạo có thể không phải lúc nào cũng đáp ứng các yêu cầu cụ thể liên quan đến các tham số phức tạp.

Nếu một hàm sử dụng đối tượng tuỳ chỉnh (chẳng hạn như lớp Cấu hình) làm đối số, thì trình chuyển đổi tự động có thể chỉ mô tả đối tượng đó là "đối tượng" chung mà không nêu chi tiết các thuộc tính nội bộ của đối tượng.

Trong những trường hợp này, bạn nên xác định giản đồ JSON theo cách thủ công để đảm bảo các thuộc tính lồng nhau (chẳng hạn như giao diện hoặc cỡ chữ trong một đối tượng config) được xác định rõ ràng cho mô hình.

import json
from transformers.utils import get_json_schema

class Config:
    def __init__(self):
        self.theme = "light"
        self.font_size = 14

def update_config(config: Config):
    """
    Updates the configuration of the system.

    Args:
        config: A Config object

    Returns:
        True if the configuration was successfully updated, False otherwise.
    """

update_config_schema = {
    "type": "function",
    "function": {
        "name": "update_config",
        "description": "Updates the configuration of the system.",
        "parameters": {
            "type": "object",
            "properties": {
                "config": {
                    "type": "object",
                    "description": "A Config object",
                    "properties": {"theme": {"type": "string"}, "font_size": {"type": "number"} },
                    },
                },
            "required": ["config"],
            },
        },
    }

print(f"--- [Automatic] ---")
print(json.dumps(get_json_schema(update_config), indent=2))

print(f"\n--- [Manual Schemas] ---")
print(json.dumps(update_config_schema, indent=2))

--- [Automatic] ---
{
  "type": "function",
  "function": {
    "name": "update_config",
    "description": "Updates the configuration of the system.",
    "parameters": {
      "type": "object",
      "properties": {
        "config": {
          "type": "object",
          "description": "A Config object"
        }
      },
      "required": [
        "config"
      ]
    }
  }
}

--- [Manual Schemas] ---
{
  "type": "function",
  "function": {
    "name": "update_config",
    "description": "Updates the configuration of the system.",
    "parameters": {
      "type": "object",
      "properties": {
        "config": {
          "type": "object",
          "description": "A Config object",
          "properties": {
            "theme": {
              "type": "string"
            },
            "font_size": {
              "type": "number"
            }
          }
        }
      },
      "required": [
        "config"
      ]
    }
  }
}

Tóm tắt và các bước tiếp theo

Bạn đã thiết lập cách tạo một ứng dụng có thể gọi các hàm bằng Gemma 4. Quy trình làm việc được thiết lập thông qua một chu kỳ gồm 4 giai đoạn:

Xác định công cụ: Tạo các hàm mà mô hình của bạn có thể sử dụng, chỉ định các đối số và nội dung mô tả (ví dụ: hàm tra cứu thời tiết).
Lượt tương tác của mô hình: Mô hình nhận được câu lệnh của người dùng và danh sách các công cụ có sẵn, trả về một đối tượng lệnh gọi hàm có cấu trúc thay vì văn bản thuần tuý.
Lượt của nhà phát triển: Nhà phát triển phân tích đầu ra này bằng cách sử dụng biểu thức chính quy để trích xuất tên và đối số hàm, thực thi mã Python thực tế và thêm kết quả vào nhật ký trò chuyện bằng cách sử dụng vai trò công cụ cụ thể.
Câu trả lời cuối cùng: Mô hình xử lý kết quả thực thi của công cụ để tạo câu trả lời cuối cùng bằng ngôn ngữ tự nhiên cho người dùng.

Hãy xem tài liệu sau đây để đọc thêm.