Tutorial: Function calling with the Gemini API


View on ai.google.dev Run in Google Colab View source on GitHub

Use function calling to define custom functions and pass them to Gemini. The model does not directly invoke these functions, but instead generates structured data output that specifies the function name and suggested arguments. This output enables the calling of external APIs, and the resulting API output can then be incorporated back into the model, allowing for more comprehensive query responses. Function calling empowers LLMs to interact with real-time information and various services, such as databases, customer relationship management systems, and document repositories, enhancing their ability to provide relevant and contextual answers. You can provide Gemini models with descriptions of functions. The model may ask you to call a function and send back the result to help the model handle your query.

If you haven't already, check out the Introduction to function calling to learn more.

Setup

Install the Python SDK

The Python SDK for the Gemini API is contained in the google-generativeai package. Install the dependency using pip:

pip install -U -q google-generativeai

Import packages

Import the necessary packages.

import pathlib
import textwrap
import time

import google.generativeai as genai

from IPython import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Set up your API key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

Get an API key

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name API_KEY.

Once you have the API key, pass it to the SDK. You can do this in two ways:

  • Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).
  • Pass the key to genai.configure(api_key=...)
try:
    # Used to securely store your API key
    from google.colab import userdata

    # Or use `os.getenv('API_KEY')` to fetch an environment variable.
    GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
except ImportError:
    import os
    GOOGLE_API_KEY = os.environ['GOOGLE_API_KEY']

genai.configure(api_key=GOOGLE_API_KEY)

Basics of function calling

To use function calling, pass a list of functions to the tools parameter when creating a GenerativeModel. The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.

def multiply(a:float, b:float):
    """returns a * b."""
    return a*b

model = genai.GenerativeModel(model_name='gemini-1.5-flash',
                              tools=[multiply])

model
genai.GenerativeModel(
    model_name='models/gemini-1.5-flash',
    generation_config={},
    safety_settings={},
    tools=<google.generativeai.types.content_types.FunctionLibrary object at 0x10e73fe90>,
)

It is recommended to use function calls through the chat interface. This is because function calls naturally fit in to multi-turn chats as they capture the back-and-forth interaction between the user and model. The Python SDK's ChatSession is a great interface for chats because it handles the conversation history for you, and using the parameter enable_automatic_function_calling simplifies function calling even further:

chat = model.start_chat(enable_automatic_function_calling=True)

With automatic function calling enabled chat.send_message automatically calls your function if the model asks it to.

It appears to simply return a text response, containing the correct answer:

response = chat.send_message('I have 57 cats, each owns 44 mittens, how many mittens is that in total?')
response.text
'The total number of mittens is 2508.'
57*44
2508

Examine the chat history to see the flow of the conversation and how function calls are integrated within it.

The ChatSession.history property stores a chronological record of the conversation between the user and the Gemini model. Each turn in the conversation is represented by a glm.Content object, which contains the following information:

  • Role: Identifies whether the content originated from the "user" or the "model".
  • Parts: A list of glm.Part objects that represent individual components of the message. With a text-only model, these parts can be:
    • Text: Plain text messages.
    • Function Call (glm.FunctionCall): A request from the model to execute a specific function with provided arguments.
    • Function Response (glm.FunctionResponse): The result returned by the user after executing the requested function.

In the previous example with the mittens calculation, the history shows the following sequence:

  1. User: Asks the question about the total number of mittens.
  2. Model: Determines that the multiply function is helpful and sends a FunctionCall request to the user.
  3. User: The ChatSession automatically executes the function (due to enable_automatic_function_calling being set) and sends back a FunctionResponse with the calculated result.
  4. Model: Uses the function's output to formulate the final answer and presents it as a text response.
for content in chat.history:
    part = content.parts[0]
    print(content.role, "->", type(part).to_dict(part))
    print('-'*80)
user -> {'text': 'I have 57 cats, each owns 44 mittens, how many mittens is that in total?'}
--------------------------------------------------------------------------------
model -> {'function_call': {'name': 'multiply', 'args': {'a': 57.0, 'b': 44.0} } }
--------------------------------------------------------------------------------
user -> {'function_response': {'name': 'multiply', 'response': {'result': 2508.0} } }
--------------------------------------------------------------------------------
model -> {'text': 'The total number of mittens is 2508.'}
--------------------------------------------------------------------------------

In general the state diagram is:

The model can always reply with text, or a FunctionCall. Iff the model sends a FunctionCall the user must reply with a FunctionResponse

The model can respond with multiple function calls before returning a text response, and function calls come before the text response.

While this was all handled automatically, if you need more control, you can:

  • Leave the default enable_automatic_function_calling=False and process the glm.FunctionCall responses yourself.
  • Or use GenerativeModel.generate_content, where you also need to manage the chat history.

Parallel function calling

In addition to basic function calling described above, you can also call multiple functions in a single turn. This section shows an example for how you can use parallel function calling.

Define the tools.

def power_disco_ball(power: bool) -> bool:
    """Powers the spinning disco ball."""
    print(f"Disco ball is {'spinning!' if power else 'stopped.'}")
    return True


def start_music(energetic: bool, loud: bool, bpm: int) -> str:
    """Play some music matching the specified parameters.

    Args:
      energetic: Whether the music is energetic or not.
      loud: Whether the music is loud or not.
      bpm: The beats per minute of the music.

    Returns: The name of the song being played.
    """
    print(f"Starting music! {energetic=} {loud=}, {bpm=}")
    return "Never gonna give you up."


def dim_lights(brightness: float) -> bool:
    """Dim the lights.

    Args:
      brightness: The brightness of the lights, 0.0 is off, 1.0 is full.
    """
    print(f"Lights are now set to {brightness:.0%}")
    return True

Now call the model with an instruction that could use all of the specified tools.

# Set the model up with tools.
house_fns = [power_disco_ball, start_music, dim_lights]

model = genai.GenerativeModel(model_name="gemini-1.5-flash", tools=house_fns)

# Call the API.
chat = model.start_chat()
response = chat.send_message("Turn this place into a party!")

# Print out each of the function calls requested from this single call.
for part in response.parts:
    if fn := part.function_call:
        args = ", ".join(f"{key}={val}" for key, val in fn.args.items())
        print(f"{fn.name}({args})")
power_disco_ball(power=True)
start_music(energetic=True, loud=True, bpm=120.0)
dim_lights(brightness=0.3)

Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested.

# Simulate the responses from the specified tools.
responses = {
    "power_disco_ball": True,
    "start_music": "Never gonna give you up.",
    "dim_lights": True,
}

# Build the response parts.
response_parts = [
    glm.Part(function_response=glm.FunctionResponse(name=fn, response={"result": val}))
    for fn, val in responses.items()
]

response = chat.send_message(response_parts)
print(response.text)
Let's get this party started! I've turned on the disco ball, started playing some upbeat music, and dimmed the lights. 🎶✨  Get ready to dance! 🕺💃

(Optional) Low level access

The automatic extraction of the schema from python functions doesn't work in all cases. For example: it doesn't handle cases where you describe the fields of a nested dictionary-object, but the API does support this. The API is able to describe any of the following types:

AllowedType = (int | float | bool | str | list['AllowedType'] | dict[str, AllowedType]

The google.ai.generativelanguage client library provides access to the low level types giving you full control.

import google.ai.generativelanguage as glm

First peek inside the model's _tools attribute, you can see how it describes the function(s) you passed it to the model:

def multiply(a:float, b:float):
    """returns a * b."""
    return a*b

model = genai.GenerativeModel(model_name='gemini-1.5-flash',
                             tools=[multiply])

model._tools.to_proto()
[function_declarations {
   name: "multiply"
   description: "returns a * b."
   parameters {
     type_: OBJECT
     properties {
       key: "b"
       value {
         type_: NUMBER
       }
     }
     properties {
       key: "a"
       value {
         type_: NUMBER
       }
     }
     required: "a"
     required: "b"
   }
 }]

This returns the list of glm.Tool objects that would be sent to the API. If the printed format is not familiar, it's because these are Google protobuf classes. Each glm.Tool (1 in this case) contains a list of glm.FunctionDeclarations, which describe a function and its arguments.

Here is a declaration for the same multiply function written using the glm classes.

Note that these classes just describe the function for the API, they don't include an implementation of it. So using this doesn't work with automatic function calling, but functions don't always need an implementation.

calculator = glm.Tool(
    function_declarations=[
      glm.FunctionDeclaration(
        name='multiply',
        description="Returns the product of two numbers.",
        parameters=glm.Schema(
            type=glm.Type.OBJECT,
            properties={
                'a':glm.Schema(type=glm.Type.NUMBER),
                'b':glm.Schema(type=glm.Type.NUMBER)
            },
            required=['a','b']
        )
      )
    ])

Equivalently, you can describe this as a JSON-compatible object:

calculator = {'function_declarations': [
      {'name': 'multiply',
       'description': 'Returns the product of two numbers.',
       'parameters': {'type_': 'OBJECT',
       'properties': {
         'a': {'type_': 'NUMBER'},
         'b': {'type_': 'NUMBER'} },
       'required': ['a', 'b']} }]}
glm.Tool(calculator)
function_declarations {
  name: "multiply"
  description: "Returns the product of two numbers."
  parameters {
    type_: OBJECT
    properties {
      key: "b"
      value {
        type_: NUMBER
      }
    }
    properties {
      key: "a"
      value {
        type_: NUMBER
      }
    }
    required: "a"
    required: "b"
  }
}

Either way, you pass a representation of a glm.Tool or list of tools to

model = genai.GenerativeModel('gemini-1.5-flash', tools=calculator)
chat = model.start_chat()

response = chat.send_message(
    f"What's 234551 X 325552 ?",
)

Like before the model returns a glm.FunctionCall invoking the calculator's multiply function:

response.candidates
[index: 0
content {
  parts {
    function_call {
      name: "multiply"
      args {
        fields {
          key: "b"
          value {
            number_value: 325552
          }
        }
        fields {
          key: "a"
          value {
            number_value: 234551
          }
        }
      }
    }
  }
  role: "model"
}
finish_reason: STOP
]

Execute the function yourself:

fc = response.candidates[0].content.parts[0].function_call
assert fc.name == 'multiply'

result = fc.args['a'] * fc.args['b']
result
76358547152.0

Send the result to the model, to continue the conversation:

response = chat.send_message(
    glm.Content(
    parts=[glm.Part(
        function_response = glm.FunctionResponse(
          name='multiply',
          response={'result': result}))]))

Summary

Basic function calling is supported in the SDK. Remember that it is easier to manage using chat-mode, because of the natural back and forth structure. You're in charge of actually calling the functions and sending results back to the model so it can produce a text-response.