Tools extend the capabilities of Gemini models, enabling them to take action in the world, access real-time information, and perform complex computational tasks. Models can use tools in both standard request-response interactions and real-time streaming sessions via the Live API.
The Gemini API provides a suite of fully managed, built-in tools optimized for Gemini models or you can define custom tools using Function Calling.
Available built-in tools
| Tool | Description | Use Cases |
|---|---|---|
| Google Search | Ground responses in current events and facts from the web to reduce hallucinations. | - Answering questions about recent events - Verifying facts with diverse sources |
| Google Maps | Build location-aware assistants that can find places, get directions, and provide rich local context. | - Planning travel itineraries with multiple stops - Finding local businesses based on user criteria |
| Code Execution | Allow the model to write and run Python code to solve math problems or process data accurately. | - Solving complex mathematical equations - Processing and analyzing text data precisely |
| URL Context | Direct the model to read and analyze content from specific web pages or documents. | - Answering questions based on specific URLs or documents - Retrieving information across different web pages |
| Computer Use (Preview) | Enable Gemini to view a screen and generate actions to interact with web browser UIs (Client-side execution). | - Automating repetitive web-based workflows - Testing web application user interfaces |
| File Search | Index and search your own documents to enable Retrieval Augmented Generation (RAG). | - Searching technical manuals - Question answering over proprietary data |
See the Pricing page for details on costs associated with specific tools.
How tools execution works
Tools allow the model to request actions during a conversation. The flow differs depending on whether the tool is built-in (managed by Google) or custom (managed by you).
Built-in tool flow
For built-in tools like Google Search or Code Execution, the entire process happens within one API call:
- You send a prompt: "What is the square root of the latest stock price of GOOG?"
- Gemini decides it needs tools and executes them on Google's servers (e.g., searches for the stock price, then runs Python code to calculate the square root).
- Gemini sends back the final answer grounded in the tool results.
Custom tool flow (Function Calling)
For custom tools and Computer Use, your application handles the execution:
- You send a prompt along with functions (tools) declarations.
- Gemini might send back a structured JSON to call a specific function (for example,
{"name": "get_order_status", "args": {"order_id": "123"}}). - You execute the function in your application or environment.
- You send the function results back to Gemini.
- Gemini uses the results to generate a final response or another tool call.
Learn more in the Function calling guide.
Structured outputs vs. function Calling
Gemini offers two methods for generating structured outputs. Use Function calling when the model needs to perform an intermediate step by connecting to your own tools or data systems. Use Structured Outputs when you strictly need the model's final response to adhere to a specific schema, such as for rendering a custom UI.
Building agents
Agents are systems that use models and tools to complete multi-step tasks. While Gemini provides the reasoning capabilities (the "brain") and the essential tools (the "hands"), you often need an orchestration framework to manage the agent's memory, plan loops, and perform complex tool chaining.
Gemini integrates with leading open-source agent frameworks:
- LangChain / LangGraph: Build stateful, complex application flows and multi-agent systems using graph structures.
- LlamaIndex: Connect Gemini agents to your private data for RAG-enhanced workflows.
- CrewAI: Orchestrate collaborative, role-playing autonomous AI agents.
- Vercel AI SDK: Build AI-powered user interfaces and agents in JavaScript/TypeScript.
- Google ADK: An open-source framework for building and orchestrating interoperable AI agents.