In diesem Leitfaden erfahren Sie, wie Sie von der generateContent API zur Interactions API migrieren.
Die Interactions API ist die Standardschnittstelle für die Entwicklung mit Gemini. Es ist für Agent-Workflows, serverseitige Statusverwaltung und komplexe multimodale Multi-Turn-Unterhaltungen optimiert und unterstützt gleichzeitig einfache zustandslose Single-Turn-Anfragen. generateContent wird weiterhin vollständig unterstützt, wir empfehlen jedoch, für alle neuen Entwicklungen die Interactions API zu verwenden.
Warum migrieren?
Die Interactions API bietet eine strukturiertere und leistungsstärkere Möglichkeit, mit Gemini zu arbeiten:
- Serverseitige Verlaufsverwaltung: Vereinfachte Abläufe mit mehreren Durchgängen über
previous_interaction_id. Der Server aktiviert den Status standardmäßig (store=true). Sie können jedoch das statuslose Verhalten aktivieren, indem Siestore=falsefestlegen. - Beobachtbare Ausführungsschritte: Mit typisierten Schritten lassen sich komplexe Abläufe einfach debuggen und die Benutzeroberfläche für Zwischenereignisse (z. B. Gedanken oder Such-Widgets) rendern.
- Für agentische Workflows entwickelt: Native Unterstützung für die mehrstufige Tool-Nutzung, Orchestrierung und komplexe Reasoning-Abläufe durch typisierte Ausführungsschritte.
- Lang andauernde Vorgänge und Hintergrundaufgaben: Unterstützt das Auslagern zeitaufwendiger Vorgänge wie Deep Think und Deep Research in Hintergrundprozesse mithilfe von
background=true. - Zugriff auf neue Modelle und Funktionen: Künftig werden neue Modelle, die über die Kernfamilie hinausgehen, sowie neue agentische Funktionen und ‑Tools ausschließlich über die Interactions API eingeführt.
generateContentwird für bestehende Anwendungsfälle weiterhin vollständig unterstützt.
Einfache Eingabe/Ausgabe
In diesem Abschnitt wird gezeigt, wie Sie eine einfache Anfrage zur Textgenerierung migrieren.
Vorher (generateContent)
Die generateContent API ist zustandslos und gibt die Antwort direkt zurück. Die Antwortstruktur umschließt die Ausgabe in einer Liste von candidates, die jeweils content mit einer Liste von zu parsenden parts enthalten.
Python
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash", contents="Tell me a joke."
)
print(response.text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Tell me a joke.",
});
console.log(response.text);
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [{
"text": "Tell me a joke."
}]
}]
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Why did the chicken cross the road? To get to the other side!"
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 4,
"candidatesTokenCount": 12,
"totalTokenCount": 16
}
}
After (Interactions API)
Die Interactions API gibt eine gespeicherte Interaktionsressource mit einem steps-Zeitachse zurück. Anstatt Kandidaten und Teilen zu durchlaufen, sollten Sie das steps-Array prüfen, um den gewünschten Ausgabetyp zu finden.
Python
from google import genai
client = genai.Client()
# The input can be a simple string shorthand
interaction = client.interactions.create(
model="gemini-3-flash-preview", input="Tell me a joke."
)
# Inspect the steps manually
for step in interaction.steps:
if step.type == "model_output":
print(step.content[0].text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
let interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: 'Tell me a joke.'
});
// Manual inspection
const modelStep = interaction.steps.find(s => s.type === 'model_output');
console.log(modelStep.content[0].text);
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "Tell me a joke."
}'
# Response
{
"id": "int_123",
"status": "completed",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [
{
"type": "text",
"text": "Tell me a joke."
}
]
},
{
"type": "model_output",
"status": "done",
"content": [
{
"type": "text",
"text": "Why did the chicken cross the road?"
}
]
}
]
}
Unterhaltungen über mehrere Themen
In der Interactions API werden Interaktionen standardmäßig gespeichert, was die serverseitige Statusverwaltung für Multi-Turn-Unterhaltungen ermöglicht.
Vorher (generateContent)
In generateContent müssen Sie den Unterhaltungsverlauf manuell mit dem contents-Array oder einem clientseitigen Chat-Helfer verwalten.
Python
Chat-Assistent verwenden (empfohlen)
from google import genai
client = genai.Client()
chat = client.chats.create(model="gemini-2.5-flash")
response1 = chat.send_message("Hi, my name is Phil.")
print(response1.text)
response2 = chat.send_message("What is my name?")
print(response2.text)
Verlauf manuell verwalten
from google import genai
from google.genai import types
client = genai.Client()
# The second turn requires sending the entire history
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
types.Content(
role="user", parts=[types.Part.from_text("Hi, my name is Phil.")]
),
types.Content(
role="model",
parts=[types.Part.from_text("Hi Phil, how can I help you?")],
),
types.Content(
role="user", parts=[types.Part.from_text("What is my name?")]
),
],
)
print(response.text)
JavaScript
Chat-Assistent verwenden (empfohlen)
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const chat = client.chats.create({ model: 'gemini-2.5-flash' });
let response = await chat.sendMessage({ message: 'Hi, my name is Phil.' });
console.log(response.text);
response = await chat.sendMessage({ message: 'What is my name?' });
console.log(response.text);
Verlauf manuell verwalten
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
// The second turn requires sending the entire history
const response = await client.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{ role: 'user', parts: [{ text: 'Hi, my name is Phil.' }] },
{ role: 'model', parts: [{ text: 'Hi Phil, how can I help you?' }] },
{ role: 'user', parts: [{ text: 'What is my name?' }] }
]
});
console.log(response.text);
REST
# Request (the second turn requires sending the entire history)
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [
{"role": "user", "parts": [{"text": "Hi, my name is Phil."}]},
{"role": "model", "parts": [{"text": "Hi Phil, how can I help you?"}]},
{"role": "user", "parts": [{"text": "What is my name?"}]}
]
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Your name is Phil."
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}
]
}
After (Interactions API)
Die Interactions API verwaltet den Status auf dem Server. Sie setzen eine Unterhaltung fort, indem Sie auf die previous_interaction_id verweisen.
Python
from google import genai
client = genai.Client()
# First turn
interaction1 = client.interactions.create(
model="gemini-3-flash-preview", input="Hi, my name is Phil."
)
print(interaction1.steps[-1].content[0].text)
# Second turn (passing previous_interaction_id)
interaction2 = client.interactions.create(
model="gemini-3-flash-preview",
previous_interaction_id=interaction1.id,
input="What is my name?",
)
print(interaction2.steps[-1].content[0].text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
// First turn
let interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: 'Hi, my name is Phil.'
});
console.log(interaction.steps.at(-1).content[0].text);
// Second turn
interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
previous_interaction_id: interaction.id,
input: 'What is my name?'
});
console.log(interaction.steps.at(-1).content[0].text);
REST
# First Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "Hi, my name is Phil."
}'
# Second Request (using ID from first response)
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"previous_interaction_id": "int_123",
"input": "What is my name?"
}'
# Response to Second Request
{
"id": "int_123",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [{ "type": "text", "text": "Hi, my name is Phil." }]
},
{
"type": "model_output",
"status": "done",
"content": [{ "type": "text", "text": "Hello Phil! How can I help you today?" }]
},
{
"type": "user_input",
"status": "done",
"content": [{ "type": "text", "text": "What is my name?" }]
},
{
"type": "model_output",
"status": "done",
"content": [{ "type": "text", "text": "Your name is Phil." }]
}
]
}
Multimodale Eingaben
Beide APIs unterstützen multimodale Eingaben (Text, Bilder, Videos usw.).
Vorher (generateContent)
In generateContent übergeben Sie eine Liste von parts im Array contents. Die Antwort gibt die Ausgabe im parts des ersten Kandidaten zurück.
Python
from google import genai
from google.genai import types
client = genai.Client()
with open("sample.jpg", "rb") as f:
image_bytes = f.read()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
"Describe this image.",
],
)
print(response.text)
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "..."
}
},
{
"text": "Describe this image."
}
]
}]
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"text": "This is a picture of a beautiful sunset."
}
],
"role": "model"
}
}
]
}
After (Interactions API)
In der Interactions API übergeben Sie ein Array an das Feld input. Sie rufen Ausgabedaten ab, indem Sie in der Zeitachse den Schritt model_output suchen.
Python
from google import genai
client = genai.Client()
# Assuming you have an image file
with open("sample.jpg", "rb") as f:
image_bytes = f.read()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input=[
{
"type": "image",
"mime_type": "image/jpeg",
"data": image_bytes,
},
{"type": "text", "text": "Describe this image."},
],
)
for step in interaction.steps:
if step.type == "model_output":
print(step.content[0].text)
JavaScript
import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const client = new GoogleGenAI({});
const imageBytes = fs.readFileSync('sample.jpg').toString('base64');
const interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: [
{
type: 'image',
mime_type: 'image/jpeg',
data: imageBytes
},
{
type: 'text',
text: 'Describe this image.'
}
]
});
for (const step of interaction.steps) {
if (step.type === 'model_output') {
console.log(step.content[0].text);
}
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": [
{
"type": "image",
"mime_type": "image/jpeg",
"data": "..."
},
{
"type": "text",
"text": "Describe this image."
}
]
}'
# Response
{
"id": "int_multimodal",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [
{
"type": "image",
"mime_type": "image/jpeg",
"data": "..."
},
{
"type": "text",
"text": "Describe this image."
}
]
},
{
"type": "model_output",
"status": "done",
"content": [
{
"type": "text",
"text": "This is a picture of a beautiful sunset over the mountains."
}
]
}
]
}
Strukturierte Ausgabe
Wenn das Modell JSON zurückgeben soll, das einem bestimmten Schema entspricht, konfigurieren Sie das Antwortformat.
Vorher (generateContent)
In generateContent konfigurieren Sie das Ausgabeformat mit dem Feld response_format, das im Objekt generationConfig verschachtelt ist.
Python
from google import genai
from google.genai import types
from pydantic import BaseModel
client = genai.Client()
class Recipe(BaseModel):
recipe_name: str
ingredients: list[str]
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Give me a recipe for chocolate chip cookies.",
config=types.GenerateContentConfig(
response_format=[
{
"type": "text",
"mime_type": "application/json",
"schema": Recipe,
}
]
),
)
print(response.text)
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [{
"text": "Give me a recipe for chocolate chip cookies."
}]
}],
"generationConfig": {
"responseFormat": [
{
"type": "text",
"mimeType": "application/json",
"schema": {
"type": "OBJECT",
"properties": {
"recipe_name": { "type": "STRING" },
"ingredients": {
"type": "ARRAY",
"items": { "type": "STRING" }
}
},
"required": ["recipe_name", "ingredients"]
}
}
]
}
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"text": "{\n \"recipe_name\": \"Chocolate Chip Cookies\",\n \"ingredients\": [\n \"1 cup butter\",\n \"1 cup sugar\",\n \"2 cups flour\",\n \"1 cup chocolate chips\"\n ]\n}"
}
],
"role": "model"
}
}
]
}
After (Interactions API)
In der Interactions API werden die Steuerelemente für das Ausgabeformat in ein response_format-Array auf oberster Ebene verschoben.
Python
from google import genai
from pydantic import BaseModel
client = genai.Client()
class Recipe(BaseModel):
recipe_name: str
ingredients: list[str]
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="Give me a recipe for chocolate chip cookies.",
response_format=[
{
"type": "text",
"mime_type": "application/json",
"schema": Recipe,
}
],
)
for step in interaction.steps:
if step.type == "model_output":
print(step.content[0].text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: 'Give me a recipe for chocolate chip cookies.',
response_format: [
{
type: 'text',
mime_type: 'application/json',
schema: {
type: 'object',
properties: {
recipe_name: { type: 'string' },
ingredients: {
type: 'array',
items: { type: 'string' }
}
},
required: ['recipe_name', 'ingredients']
}
}
]
});
for (const step of interaction.steps) {
if (step.type === 'model_output') {
console.log(step.content[0].text);
}
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "Give me a recipe for chocolate chip cookies.",
"response_format": [
{
"type": "text",
"mime_type": "application/json",
"schema": {
"type": "OBJECT",
"properties": {
"recipe_name": { "type": "STRING" },
"ingredients": {
"type": "ARRAY",
"items": { "type": "STRING" }
}
},
"required": ["recipe_name", "ingredients"]
}
}
]
}'
# Response
{
"id": "int_structured",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [{ "type": "text", "text": "Give me a recipe for chocolate chip cookies." }]
},
{
"type": "model_output",
"status": "done",
"content": [
{
"type": "text",
"text": "{\n \"recipe_name\": \"Chocolate Chip Cookies\",\n \"ingredients\": [\n \"1 cup butter\",\n \"1 cup sugar\",\n \"2 cups flour\",\n \"1 cup chocolate chips\"\n ]\n}"
}
]
}
]
}
Multimodale Generierung
Beim Generieren von Inhalten in anderen Modalitäten als Text (z. B. Bilder oder Audio) besteht der Hauptunterschied darin, wie die Antwort die generierten Medien strukturiert.
Vorher (generateContent)
In generateContent werden generierte Media in der Antwort direkt im parts des Kandidaten zurückgegeben, in der Regel als Base64-Daten in inlineData.
# Response structure concept
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Here is your generated image:"
},
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "...base64..."
}
}
]
}
}
]
}
After (Interactions API)
In der Interactions API werden generierte Medien als separate Elemente im content-Array eines model_output-Schritts auf der Zeitachse angezeigt, wodurch der chronologische Ablauf der Interaktion beibehalten wird.
# Response structure concept
{
"id": "int_123",
"steps": [
{
"type": "model_output",
"status": "done",
"content": [
{
"type": "text",
"text": "Here is your generated image:"
},
{
"type": "image",
"mime_type": "image/jpeg",
"data": "...base64..." // Or a reference URL in future
}
]
}
]
}
So wird die Antwortanalyse konsistent mit der Verarbeitung von Eingaben und Textausgaben gehalten – alles ist ein Schritt auf der Zeitachse.
Serverseitige Tools
Gemini unterstützt integrierte serverseitige Tools wie die Fundierung über die Google Suche. Der Hauptunterschied besteht darin, wie die Ausführung von Tools in der Antwort dargestellt wird.
Vorher (generateContent)
In generateContent sind serverseitige Tools weitgehend undurchsichtig. Sie aktivieren das Tool und erhalten eine endgültige Antwort mit einem separaten groundingMetadata-Objekt. Wichtig ist, dass Zitationen nicht inline erfolgen. groundingSupports verwendet Zeichenindexe, um Textsegmente in groundingChunks Webquellen zuzuordnen.
Python
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Who won Euro 2024?",
config=types.GenerateContentConfig(
tools=[{"google_search": {}}]
),
)
# Access search entry point (widget) and citations
metadata = response.candidates[0].grounding_metadata
if metadata.search_entry_point:
print(f"Search Entry Point: {metadata.search_entry_point.rendered_content}")
for support in metadata.grounding_supports:
print(f"Citation: {support.segment.text}")
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const response = await client.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Who won Euro 2024?',
config: {
tools: [{ google_search: {} }]
}
});
const metadata = response.candidates[0].groundingMetadata;
if (metadata.searchEntryPoint) {
console.log(`Search Entry Point: ${metadata.searchEntryPoint.renderedContent}`);
}
for (const support of metadata.groundingSupports) {
console.log(`Citation: ${support.segment.text}`);
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [{
"text": "Who won Euro 2024?"
}]
}],
"tools": [{
"googleSearchRetrieval": {}
}]
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title."
}
],
"role": "model"
},
"groundingMetadata": {
"webSearchQueries": [
"UEFA Euro 2024 winner",
"who won euro 2024"
],
"searchEntryPoint": {
"renderedContent": "<!-- HTML and CSS for the search widget -->"
},
"groundingChunks": [
{"web": {"uri": "https://vertexaisearch.cloud.google.com.....", "title": "aljazeera.com"}},
{"web": {"uri": "https://vertexaisearch.cloud.google.com.....", "title": "uefa.com"}}
],
"groundingSupports": [
{
"segment": {"startIndex": 0, "endIndex": 85, "text": "Spain won Euro 2024, defeatin..."},
"groundingChunkIndices": [0]
},
{
"segment": {"startIndex": 86, "endIndex": 210, "text": "This victory marks Spain's..."},
"groundingChunkIndices": [0, 1]
}
]
}
}
]
}
After (Interactions API)
In der Interactions API bieten serverseitige Tools vollständige Transparenz für den Zeitachse. Die API zeichnet den Aufruf und das Ergebnis als separate Ausführungen steps (google_search_call und google_search_result) auf und zeigt genau, welche Daten das Modell abgerufen hat.
Außerdem gibt die API Inline-Zitationen zurück. Anstatt Indizes aus einem separaten Metadatenobjekt zuzuordnen, enthält das Textelement im Schritt model_output ein eigenes annotations-Array, das direkt mit der Quelle verknüpft ist.
Python
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="Who won Euro 2024?",
tools=[{"type": "google_search"}],
)
for step in interaction.steps:
if step.type == "google_search_result":
print(f"Search Suggestions: {step.search_suggestions}")
elif step.type == "model_output":
print(f"Answer: {step.content[0].text}")
if step.content[0].annotations:
for anno in step.content[0].annotations:
print(f"Citation: {anno.title} ({anno.uri})")
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: 'Who won Euro 2024?',
tools: [{ type: 'google_search' }]
});
for (const step of interaction.steps) {
if (step.type === 'google_search_result') {
console.log(`Search Suggestions: ${step.search_suggestions}`);
} else if (step.type === 'model_output') {
console.log(`Answer: ${step.content[0].text}`);
if (step.content[0].annotations) {
for (const anno of step.content[0].annotations) {
console.log(`Citation: ${anno.title} (${anno.uri})`);
}
}
}
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "Who won Euro 2024?",
"tools": [{"type": "google_search"}]
}'
# Response (showing grounding)
{
"id": "int_grounded",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [{ "type": "text", "text": "Who won Euro 2024?" }]
},
{
"type": "google_search_call",
"status": "done",
"content": [{ "type": "text", "text": "UEFA Euro 2024 winner" }]
},
{
"type": "google_search_result",
"status": "done",
"content": [
{
"type": "text",
"text": "Spain won Euro 2024..."
}
]
},
{
"type": "model_output",
"status": "done",
"content": [
{
"type": "text",
"text": "Spain won Euro 2024, defeating England 2-1.",
"annotations": [
{
"start_index": 0,
"end_index": 42,
"uri": "https://vertexaisearch...",
"title": "aljazeera.com"
}
]
}
]
}
]
}
Funktionsaufrufe
Die Struktur von Funktionsaufrufen und Ergebnissen wurde ebenfalls an das Steps-Schema angepasst.
Vorher (generateContent)
In generateContent werden Funktionsaufrufe in den Kandidaten zurückgegeben.
Python
from google import genai
from google.genai import types
client = genai.Client()
# Step 1: Send prompt with tools
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the weather in Boston?",
config=types.GenerateContentConfig(tools=[weather_tool]),
)
# Assume model returned function_call
function_call = response.candidates[0].content.parts[0].function_call
print(f"Requested tool: {function_call.name}")
# Step 2: Execute local function and send result back
result = "52°F and rain"
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
types.Content(
role="user",
parts=[
types.Part.from_text(text="What's the weather in Boston?")
],
),
response.candidates[0].content, # Model turn with function call
types.Content(
role="user",
parts=[
types.Part.from_function_response(
name=function_call.name,
response={"result": result},
)
],
),
],
config=types.GenerateContentConfig(tools=[weather_tool]),
)
print(response.text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
// Step 1: Send prompt with tools
let response = await client.models.generateContent({
model: 'gemini-2.5-flash',
contents: "What's the weather in Boston?",
config: { tools: [weatherTool] }
});
const functionCall = response.candidates[0].content.parts[0].functionCall;
console.log(`Requested tool: ${functionCall.name}`);
// Step 2: Execute local function and send result back
const result = "52°F and rain";
response = await client.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{ role: 'user', parts: [{ text: "What's the weather in Boston?" }] },
response.candidates[0].content, // Model turn
{
role: 'user',
parts: [{
functionResponse: {
name: functionCall.name,
response: { result: result }
}
}]
}
],
config: { tools: [weatherTool] }
});
console.log(response.text);
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [{
"text": "What is the weather like in Boston, MA?"
}]
}],
"tools": [{
"functionDeclarations": [{
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "OBJECT",
"properties": {
"location": {"type": "STRING"}
},
"required": ["location"]
}
}]
}]
}'
# Response
{
"candidates": [
{
"content": {
"parts": [
{
"functionCall": {
"name": "get_weather",
"args": { "location": "Boston, MA" }
}
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}
]
}
After (Interactions API)
Tool-Aufrufe und ‑Ergebnisse sind jetzt separate Schritte auf der Zeitachse.
Python
from google import genai
client = genai.Client()
weather_tool = {
"type": "function",
"name": "get_weather",
"description": "Gets weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
},
}
interaction = client.interactions.create(
model="gemini-3-flash-preview",
input="What's the weather in Boston?",
tools=[weather_tool],
)
# Check if the model requested a tool call
for step in interaction.steps:
if step.type == "function_call":
print(f"Executing {step.name} for {step.arguments}")
# Execute your local function here...
result = "52°F and rain"
# Submit the result back as a step
interaction = client.interactions.create(
model="gemini-3-flash-preview",
previous_interaction_id=interaction.id,
input=[
{
"type": "function_result",
"call_id": step.id,
"name": step.name,
"result": [{"type": "text", "text": result}],
}
],
)
# Inspect steps for final response
for s in interaction.steps:
if s.type == "model_output":
print(s.content[0].text)
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const weatherTool = {
type: "function",
name: "get_weather",
description: "Get weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
};
const interaction = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: "What's the weather in Boston?",
tools: [weatherTool]
});
// Check if the model requested a tool call
for (const step of interaction.steps) {
if (step.type === 'function_call') {
console.log(`Executing ${step.name} for ${JSON.stringify(step.arguments)}`);
const result = "52°F and rain";
// Submit the result back as a step
const nextInteraction = await client.interactions.create({
model: 'gemini-3-flash-preview',
previous_interaction_id: interaction.id,
input: [
{
type: 'function_result',
call_id: step.id,
name: step.name,
result: [{ type: 'text', text: result }]
}
]
});
// Inspect steps for final response
for (const s of nextInteraction.steps) {
if (s.type === 'model_output') {
console.log(s.content[0].text);
}
}
}
}
REST
# Initial Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "What's the weather in Boston?",
"tools": [{
"type": "function",
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}]
}'
# Response (requires action)
{
"id": "int_001",
"status": "requires_action",
"steps": [
{
"type": "user_input",
"status": "done",
"content": [
{ "type": "text", "text": "What's the weather in Boston?" }
]
},
{
"type": "function_call",
"status": "waiting",
"id": "fc_1",
"name": "get_weather",
"arguments": { "location": "Boston, MA" }
}
]
}
# Submit Tool Result Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"previous_interaction_id": "int_001",
"input": {
"type": "function_result",
"call_id": "fc_1",
"name": "get_weather",
"result": [
{ "type": "text", "text": "52°F with rain" }
]
}
}'
# Final Response
{
"id": "int_002",
"status": "completed",
"steps": [
{
"type": "function_result",
"call_id": "fc_1",
"name": "get_weather",
"result": [
{ "type": "text", "text": "52°F with rain" }
]
},
{
"type": "model_output",
"status": "done",
"content": [
{ "type": "text", "text": "It's 52°F with rain in Boston." }
]
}
]
}
Streaming
Ein wichtiger Unterschied beim Streaming besteht darin, dass bei der Interactions API derselbe Endpunkt mit "stream": true im Anfragebody verwendet wird, während bei der generateContent API ein dedizierter Endpunkt (:streamGenerateContent) aufgerufen werden musste.
Außerdem werden für Streaming-Ereignisse jetzt spezielle Typen verwendet, um den Interaktions-Lifecycle zu überwachen und Ausführungsschritte entlang der Zeitachse zu erfassen.
Vorher (generateContentStream)
Mit generateContent wird ein Stream von Antwort-Chunks genutzt.
Python
response = client.models.generate_content_stream(
model="gemini-2.5-flash", contents="Tell me a story"
)
for chunk in response:
print(chunk.text, end="")
JavaScript
const responseStream = await client.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: 'Tell me a story',
});
for await (const chunk of responseStream) {
process.stdout.write(chunk.text);
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{
"parts": [{
"text": "Tell me a story"
}]
}]
}'
# Response stream
event: content.start
data: {"event_type": "content.start", "index": 0, "content": {"type": "thought"}}
event: content.delta
data: {"event_type": "content.delta", "index": 0, "delta": {"type": "thought_summary", "text": "User wants an explanation."}}
event: content.stop
data: {"event_type": "content.stop", "index": 0}
event: content.start
data: {"event_type": "content.start", "index": 1, "content": {"type": "text"}}
event: content.delta
data: {"event_type": "content.delta", "index": 1, "delta": {"type": "text", "text": "Hello"}}
event: content.stop
data: {"event_type": "content.stop", "index": 1}
After (Interactions API)
In der Interactions API werden für das Streaming Server-Sent Events (SSE) und spezielle Deltatypen verwendet, um Ausführungsschritte darzustellen.
Python
from google import genai
client = genai.Client()
stream = client.interactions.create(
model="gemini-3-flash-preview",
input="Tell me a story",
stream=True,
)
for event in stream:
if event.event_type == "step.delta":
if event.delta.type == "text":
print(event.delta.text, end="", flush=True)
elif event.event_type == "interaction.completed":
print(f"\n\n--- Stream Finished ---")
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const stream = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: 'Tell me a story',
stream: true,
});
for await (const event of stream) {
if (event.event_type === 'step.delta') {
if (event.delta.type === 'text' && 'text' in event.delta) {
process.stdout.write(event.delta.text);
}
} else if (event.event_type === 'interaction.completed') {
console.log('\n\n--- Stream Finished ---');
}
}
REST
# Beispiel für die SSE-Streamausgabe event: interaction.created data: {"type": "interaction.created", "interaction": {"id": "int_xyz", "status": "created"}} event: interaction.in_progress data: {"type": "interaction.in_progress", "interaction": {"id": "int_xyz", "status": "in_progress"}} event: step.start data: {"type": "step.start", "index": 0, "step": {"type": "thought"}} event: step.delta data: {"type": "step.delta", "index": 0, "delta": {"type": "thought", "text": "User wants an explanation."}} event: step.stop data: {"type": "step.stop", "index": 0, "status": "done"} event: step.start data: {"type": "step.start", "index": 1, "step": {"type": "model_output"}} event: step.delta data: {"type": "step.delta", "index": 1, "delta": {"type": "text", "text": "Hello"}} event: step.stop data: {"type": "step.stop", "index": 1, "status": "done"} event: interaction.completed data: {"type": "interaction.completed", "interaction": {"id": "int_xyz", "status": "completed", "usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}}} ```
Streaming-Tools und Funktionsaufrufe
Das Verhalten von Tools im Stream hat sich seit generateContent deutlich verändert, um eine detailliertere Steuerung und Sichtbarkeit zu ermöglichen.
Vorher (generateContent)
Mit generateContent wurden Streaming-Funktionsaufrufe vollständig in einem einzigen Chunk empfangen. Da die Argumente nicht in Echtzeit generiert wurden, hat der Handler einfach nach einem vollständigen functionCall-Objekt gesucht.
Python
from google import genai
from google.genai import types
client = genai.Client()
stream = client.models.generate_content_stream(
model="gemini-2.5-flash",
contents="What's the weather in Boston?",
config=types.GenerateContentConfig(tools=[weather_tool]),
)
for chunk in stream:
# Function calls arrived complete — no partial arguments
if chunk.candidates[0].content.parts[0].function_call:
fc = chunk.candidates[0].content.parts[0].function_call
print(f"Call: {fc.name}({fc.args})")
elif chunk.text:
print(chunk.text, end="")
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const stream = await client.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: "What's the weather in Boston?",
config: { tools: [weatherTool] }
});
for await (const chunk of stream) {
// Function calls arrived complete — no partial arguments
const part = chunk.candidates[0].content.parts[0];
if (part.functionCall) {
console.log(`Call: ${part.functionCall.name}(${JSON.stringify(part.functionCall.args)})`);
} else if (part.text) {
process.stdout.write(part.text);
}
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [{"parts": [{"text": "What'\''s the weather in Boston?"}]}],
"tools": [{"functionDeclarations": [{"name": "get_weather", "parameters": {"type": "OBJECT", "properties": {"location": {"type": "STRING"}}}}]}]
}'
# Response stream — function call arrives complete in one chunk
{"candidates": [{"content": {"parts": [{"functionCall": {"name": "get_weather", "args": {"location": "Boston, MA"}}}]}}]}
After (Interactions API)
Die Interactions API streamt Funktionsaufrufargumente zeichenweise als arguments-Ereignisse. Der gesamte Tool-Lebenszyklus – Denken, Aufruf, Ergebnis und Ausgabe – wird in einer Reihe von separaten Schritten durchlaufen.
Python
from google import genai
client = genai.Client()
stream = client.interactions.create(
model="gemini-3-flash-preview",
input="What's the weather in Boston?",
tools=[get_weather_tool],
stream=True,
)
for event in stream:
if event.event_type == "step.start":
if event.step.type == "function_call":
print(f"Calling: {event.step.name}")
elif event.event_type == "step.delta":
if event.delta.type == "arguments":
print(f" args: {event.delta.partial_arguments}")
elif event.delta.type == "text":
print(event.delta.text, end="")
elif event.event_type == "interaction.completed":
print("\n--- Done ---")
JavaScript
import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});
const stream = await client.interactions.create({
model: 'gemini-3-flash-preview',
input: "What's the weather in Boston?",
tools: [getWeatherTool],
stream: true,
});
for await (const event of stream) {
if (event.event_type === 'step.start') {
if (event.step.type === 'function_call') {
console.log(`Calling: ${event.step.name}`);
}
} else if (event.event_type === 'step.delta') {
if (event.delta.type === 'arguments') {
console.log(` args: ${event.delta.partial_arguments}`);
} else if (event.delta.type === 'text') {
process.stdout.write(event.delta.text);
}
} else if (event.event_type === 'interaction.completed') {
console.log('\n--- Done ---');
}
}
REST
# Request
curl -X POST "https://generativelanguage.googleapis.com/v1beta2/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "gemini-3-flash-preview",
"input": "What'\''s the weather in Boston?",
"tools": [{"type": "function", "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}],
"stream": true
}'
# Response stream
// Interaction created
event: interaction.created
data: {"type": "interaction.created", "interaction": {"id": "int_xyz", "status": "created"}}
event: interaction.in_progress
data: {"type": "interaction.in_progress", "interaction": {"id": "int_xyz", "status": "in_progress"}}
// ── Step 0: Thought ──────────────────────────────────
event: step.start
data: {"type": "step.start", "index": 0, "step": {"type": "thought"}}
event: step.delta
data: {"type": "step.delta", "index": 0, "delta": {"type": "thought", "text": "The user wants weather data for Boston. I'll call the get_weather tool."}}
event: step.stop
data: {"type": "step.stop", "index": 0, "status": "done"}
// ── Step 1: Function Call (arguments streamed) ───────
event: step.start
data: {"type": "step.start", "index": 1, "step": {"type": "function_call", "id": "fc_1", "name": "get_weather"}}
event: step.delta
data: {"type": "step.delta", "index": 1, "delta": {"type": "arguments", "partial_arguments": "{\"location\": \"Boston, MA\"}"}}
event: step.stop
data: {"type": "step.stop", "index": 1, "status": "waiting"}
// The interaction pauses — the model needs the tool result before continuing.
event: interaction.requires_action
data: {"type": "interaction.requires_action", "interaction": {"id": "int_xyz", "status": "requires_action"}}
// ── (Client submits the tool result) ──────────────────
// The client calls interactions.create with the function_result as input
// and the previous interaction's ID, then resumes consuming the stream.
event: interaction.in_progress
data: {"type": "interaction.in_progress", "interaction": {"id": "int_xyz", "status": "in_progress"}}
// ── Step 2: Function Result (echoed back, no deltas) ─
event: step.start
data: {"type": "step.start", "index": 2, "step": {"type": "function_result", "call_id": "fc_1", "name": "get_weather", "result": [{"type": "text", "text": "52°F, rain"}]}}
event: step.stop
data: {"type": "step.stop", "index": 2, "status": "done"}
// ── Step 3: Thought ──────────────────────────────────
event: step.start
data: {"type": "step.start", "index": 3, "step": {"type": "thought"}}
event: step.delta
data: {"type": "step.delta", "index": 3, "delta": {"type": "thought", "text": "Got weather data. Composing the final response."}}
event: step.stop
data: {"type": "step.stop", "index": 3, "status": "done"}
// ── Step 4: Model Output (text streamed) ─────────────
event: step.start
data: {"type": "step.start", "index": 4, "step": {"type": "model_output"}}
event: step.delta
data: {"type": "step.delta", "index": 4, "delta": {"type": "text", "text": "It's currently 52°F and rainy in Boston."}}
event: step.stop
data: {"type": "step.stop", "index": 4, "status": "done"}
// ── Interaction complete ─────────────────────────────
event: interaction.completed
data: {"type": "interaction.completed", "interaction": {"id": "int_xyz", "status": "completed", "usage": {"prompt_tokens": 256, "completion_tokens": 128, "total_tokens": 384}}}