Gemini API: Basic function calling with Python

View on ai.google.dev Run in Google Colab View source on GitHub

Setup

Install the Python SDK

The Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip:

pip install -U google-generativeai

Import packages

Import the necessary packages.

import pathlib
import textwrap

import google.generativeai as genai

# Used to securely store your API key
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Set up your API key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

Get an API key

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name API_KEY.

Once you have the API key, pass it to the SDK. You can do this in two ways:

  • Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).
  • Pass the key to genai.configure(api_key=...)
# Or use `os.getenv('API_KEY')` to fetch an environment variable.
API_KEY=userdata.get('API_KEY')

genai.configure(api_key=API_KEY)

Function calls

The google.ai.generativelanguage client library provides access to the low level types required for function calling.

import google.ai.generativelanguage as glm

A glm.Tool contains a list of glm.FunctionDeclarations. These just describe the function, they don't implement it.

datetime = glm.Tool(
    function_declarations=[
      glm.FunctionDeclaration(
        name='now',
        description="Returns the current UTC date and time."
      )
    ]
)

Pass a list of tools to the genai.GenerativeModel constructor to give the model access:

model = genai.GenerativeModel(
    'gemini-pro',
    tools=[datetime])

For this basic tools support use chat-mode since tools require multiple rounds of back and forth.

chat = model.start_chat()

response = chat.send_message(
    'How many days until Christmas',
)

When the model needs to call a tool to answer a question it returns a glm.Part containing a function_call instead of a text attribute:

response.candidates
[index: 0
content {
  parts {
    function_call {
      name: "now"
      args {
      }
    }
  }
  role: "model"
}
finish_reason: STOP
]

Reply with a glm.Part containing a glm.FunctionResponse to allow the model to finish the answer:

response = chat.send_message(
  glm.Content(
    parts=[glm.Part(
        function_response = glm.FunctionResponse(
          name='now',
          response={'datetime': 'Sun Dec 5 03:33:56 PM UTC 2023'}
        )
    )]
  )
)

The model may respond with either a text response or another glm.FunctionCall:

response.text
' Okay, Christmas this year, 2023, is on Monday, December 25th. That makes it 20 days from now.'

That datetime tool only contained a single function, which takes no arguments. Next try something more complex.

LLMs are, generally, not 100% accurate at arithmetic:

model = genai.GenerativeModel('gemini-pro')
chat = model.start_chat()

a = 2312371
b = 234234

response = chat.send_message(
    f"What's {a} X {b} ?",

)
print(response.text)
549899573314
a*b
541635908814

Sometimes it's off by ~1%, sometimes it's off by 10X.

error_percent = (a*b - int(response.text.replace(',', '')))/(a*b) * 100

print(f"Error: {error_percent:.2f}%")
Error: -1.53%

So, describe a calculator as a glm.Tool:

calculator = glm.Tool(
    function_declarations=[
      glm.FunctionDeclaration(
        name='add',
        description="Returns the sum of two numbers.",
        parameters=glm.Schema(
            type=glm.Type.OBJECT,
            properties={
                'a': glm.Schema(type=glm.Type.NUMBER),
                'b': glm.Schema(type=glm.Type.NUMBER)
            },
            required=['a','b']
        )
      ),
      glm.FunctionDeclaration(
        name='multiply',
        description="Returns the product of two numbers.",
        parameters=glm.Schema(
            type=glm.Type.OBJECT,
            properties={
                'a':glm.Schema(type=glm.Type.NUMBER),
                'b':glm.Schema(type=glm.Type.NUMBER)
            },
            required=['a','b']
        )
      )
    ])

Give the model the calculator and ask again:

model = genai.GenerativeModel('gemini-pro', tools=[calculator])
chat = model.start_chat()

response = chat.send_message(
    f"What's {a} X {b} ?",
)

Now instead of guessing at the answer the model returns a glm.FunctionCall invoking the calculator's multiply function:

response.candidates
[index: 0
content {
  parts {
    function_call {
      name: "multiply"
      args {
        fields {
          key: "b"
          value {
            number_value: 234234
          }
        }
        fields {
          key: "a"
          value {
            number_value: 2312371
          }
        }
      }
    }
  }
  role: "model"
}
finish_reason: STOP
]

Execute the function yourself:

fc = response.candidates[0].content.parts[0].function_call
assert fc.name == 'multiply'

result = fc.args['a'] * fc.args['b']
result
541635908814.0

Send the result to the model, to continue the conversation:

response = chat.send_message(
    glm.Content(
    parts=[glm.Part(
        function_response = glm.FunctionResponse(
          name='multiply',
          response={'result': result}
        )
    )]
  )
)
response.text
' 541636000000'

Summary

Basic function calling is supported in the SDK. Remember that it is easier to manage using chat-mode, because of the natural back and forth structure. You're in charge of actually calling the functions and sending results back to the model so it can produce a text-response.