查看全新 Gemini API 实战宝典和我们的社区论坛。

此页面由 Cloud Translation API 翻译。

Gemini API：使用 Python 函数调用

在 ai.google.dev 上查看

在 Google Colab 中运行

查看 GitHub 上的源代码

您可以为 Gemini 模型提供函数说明。模型可能会要求您调用一个函数并发回相应结果，以帮助模型处理您的查询。

初始设置

安装 Python SDK

适用于 Gemini API 的 Python SDK 包含在 google-generativeai 软件包中。使用 pip 安装依赖项：

pip install -U -q google-generativeai

导入软件包

导入必要的软件包。

import pathlib
import textwrap
import time

import google.generativeai as genai


from IPython import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

设置您的 API 密钥

您必须先获取 API 密钥，然后才能使用 Gemini API。如果您还没有密钥，请在 Google AI Studio 中一键创建。

获取 API 密钥

在 Colab 中，将密钥添加到 Secret 管理器中左侧面板中的“🔑?”下。将其命名为 API_KEY。

获得 API 密钥后，将其传递给 SDK。可以通过以下两种方法实现此目的：

将密钥放在 GOOGLE_API_KEY 环境变量中（SDK 会自动从该变量中获取密钥）。
将密钥传递给 genai.configure(api_key=...)

try:
    # Used to securely store your API key
    from google.colab import userdata

    # Or use `os.getenv('API_KEY')` to fetch an environment variable.
    GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
except ImportError:
    import os
    GOOGLE_API_KEY = os.environ['GOOGLE_API_KEY']

genai.configure(api_key=GOOGLE_API_KEY)

函数基础知识

您可以在创建 genai.GenerativeModel 时将函数列表传递给 tools 实参。

重要提示 ：SDK 会将函数参数的类型注解转换为 API 能够识别的格式。该 API 仅支持有限的参数类型选择，而此自动转换仅支持其中的一部分：int | float | bool | str | list | dict

def multiply(a:float, b:float):
    """returns a * b."""
    return a*b

model = genai.GenerativeModel(model_name='gemini-1.0-pro',
                              tools=[multiply])

model

genai.GenerativeModel(
    model_name='models/gemini-1.0-pro',
    generation_config={},
    safety_settings={},
    tools=<google.generativeai.types.content_types.FunctionLibrary object at 0x10e73fe90>,
)

使用函数调用的推荐方法是通过聊天界面使用。主要原因在于，FunctionCalls 非常适合聊天的多轮结构。

chat = model.start_chat(enable_automatic_function_calling=True)

启用自动函数调用后，如果模型要求，chat.send_message 会自动调用您的函数。

它似乎只会返回包含正确答案的文本响应：

response = chat.send_message('I have 57 cats, each owns 44 mittens, how many mittens is that in total?')
response.text

'The total number of mittens is 2508.'

57*44

如果您查看 ChatSession.history，则会看到事件的顺序：

您发送了问题。
模型回复了 glm.FunctionCall。
genai.ChatSession 在本地执行了该函数，并向模型发回了一个 glm.FunctionResponse。
模型在其回答中使用了该函数输出。

for content in chat.history:
    part = content.parts[0]
    print(content.role, "->", type(part).to_dict(part))
    print('-'*80)

user -> {'text': 'I have 57 cats, each owns 44 mittens, how many mittens is that in total?'}
--------------------------------------------------------------------------------
model -> {'function_call': {'name': 'multiply', 'args': {'a': 57.0, 'b': 44.0} } }
--------------------------------------------------------------------------------
user -> {'function_response': {'name': 'multiply', 'response': {'result': 2508.0} } }
--------------------------------------------------------------------------------
model -> {'text': 'The total number of mittens is 2508.'}
--------------------------------------------------------------------------------

通常，状态图如下：

模型始终可以使用文本或 FunctionCall 进行回复。如果模型发送了 FunctionCall，用户必须使用 FunctionResponse 进行回复

模型可以在返回文本响应之前使用多个函数调用进行响应，并且函数调用先于文本响应。

虽然所有操作都是自动处理的，但如果您需要更多控制权，可以执行以下操作：

保留默认的 enable_automatic_function_calling=False，并自行处理 glm.FunctionCall 响应。
或者使用 GenerativeModel.generate_content，因为您还需要通过此应用管理聊天记录。

[可选] 低级别访问权限

从 Python 函数中自动提取架构并不适用于所有情况。例如：它不会处理您描述嵌套字典对象的字段的情况，但该 API 支持这一点。此 API 能够描述以下任意类型：

AllowedType = (int | float | bool | str | list['AllowedType'] | dict[str, AllowedType]

google.ai.generativelanguage 客户端库提供对低级别类型的访问权限，让您可以完全控制。

import google.ai.generativelanguage as glm

先了解一下模型的 _tools 属性，您可以看到它如何描述您传递给模型的函数：

def multiply(a:float, b:float):
    """returns a * b."""
    return a*b

model = genai.GenerativeModel(model_name='gemini-1.0-pro',
                             tools=[multiply])

model._tools.to_proto()

[function_declarations {
   name: "multiply"
   description: "returns a * b."
   parameters {
     type_: OBJECT
     properties {
       key: "b"
       value {
         type_: NUMBER
       }
     }
     properties {
       key: "a"
       value {
         type_: NUMBER
       }
     }
     required: "a"
     required: "b"
   }
 }]

这将返回将发送到 API 的 glm.Tool 对象的列表。如果您对输出的格式不熟悉，则是因为这些是 Google protobuf 类。每个 glm.Tool（在本例中为 1）都包含一个 glm.FunctionDeclarations 列表，该列表描述了函数及其实参。

以下是使用 glm 类编写的同一乘法函数的声明。

请注意，这些类只描述 API 的函数，不包含函数的实现。因此，这不适用于自动函数调用，但函数并不总是需要实现。

calculator = glm.Tool(
    function_declarations=[
      glm.FunctionDeclaration(
        name='multiply',
        description="Returns the product of two numbers.",
        parameters=glm.Schema(
            type=glm.Type.OBJECT,
            properties={
                'a':glm.Schema(type=glm.Type.NUMBER),
                'b':glm.Schema(type=glm.Type.NUMBER)
            },
            required=['a','b']
        )
      )
    ])

同样，您可以将其描述为一个与 JSON 兼容的对象：

calculator = {'function_declarations': [
      {'name': 'multiply',
       'description': 'Returns the product of two numbers.',
       'parameters': {'type_': 'OBJECT',
       'properties': {
         'a': {'type_': 'NUMBER'},
         'b': {'type_': 'NUMBER'} },
       'required': ['a', 'b']} }]}

glm.Tool(calculator)

function_declarations {
  name: "multiply"
  description: "Returns the product of two numbers."
  parameters {
    type_: OBJECT
    properties {
      key: "b"
      value {
        type_: NUMBER
      }
    }
    properties {
      key: "a"
      value {
        type_: NUMBER
      }
    }
    required: "a"
    required: "b"
  }
}

无论采用哪种方式，您都需要将 glm.Tool 的表示法或工具列表传递给

model = genai.GenerativeModel('gemini-pro', tools=calculator)
chat = model.start_chat()

response = chat.send_message(
    f"What's 234551 X 325552 ?",
)

与之前一样，模型会返回调用计算器的 multiply 函数的 glm.FunctionCall：

response.candidates

[index: 0
content {
  parts {
    function_call {
      name: "multiply"
      args {
        fields {
          key: "b"
          value {
            number_value: 325552
          }
        }
        fields {
          key: "a"
          value {
            number_value: 234551
          }
        }
      }
    }
  }
  role: "model"
}
finish_reason: STOP
]

自行执行该函数：

fc = response.candidates[0].content.parts[0].function_call
assert fc.name == 'multiply'

result = fc.args['a'] * fc.args['b']
result

76358547152.0

将结果发送给模型，以继续对话：

response = chat.send_message(
    glm.Content(
    parts=[glm.Part(
        function_response = glm.FunctionResponse(
          name='multiply',
          response={'result': result}))]))

摘要

SDK 支持基本的函数调用。请注意，使用聊天模式更易于管理，因为它可以自然地来回切换。您负责实际调用函数并将结果发送回模型，以便模型生成文本响应。