Interactions API

Interactions API (Beta 版) 是統一的介面，可與 Gemini 模型和代理程式互動。這項 API 是 generateContent API 的改良替代方案，可簡化狀態管理、工具自動化調度管理和長時間執行的工作。如要全面瞭解 API 結構定義，請參閱 API 參考資料。在 Beta 版期間，功能和結構定義可能會出現重大變更。如要快速入門，請試用 Interactions API 快速入門筆記本。

以下範例說明如何使用文字提示呼叫 Interactions API。

Python

from google import genai

client = genai.Client()

interaction =  client.interactions.create(
    model="gemini-3-flash-preview",
    input="Tell me a short joke about programming."
)

print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction =  await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Tell me a short joke about programming.',
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Tell me a short joke about programming."
}'

基本互動

您可以透過現有 SDK 使用 Interactions API。與模型互動最簡單的方式就是提供文字提示。input 可以是字串、包含內容物件的清單，或是包含角色和內容物件的回合清單。

Python

from google import genai

client = genai.Client()

interaction =  client.interactions.create(
    model="gemini-3-flash-preview",
    input="Tell me a short joke about programming."
)

print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction =  await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Tell me a short joke about programming.',
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Tell me a short joke about programming."
}'

對話

您可以透過兩種方式建立多輪對話：

有狀態，可參照先前的互動
無狀態，方法是提供完整對話記錄

有狀態的對話

將先前互動的 id 傳遞至 previous_interaction_id 參數，繼續對話。

Python

from google import genai

client = genai.Client()

# 1. First turn
interaction1 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Hi, my name is Phil."
)
print(f"Model: {interaction1.outputs[-1].text}")

# 2. Second turn (passing previous_interaction_id)
interaction2 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What is my name?",
    previous_interaction_id=interaction1.id
)
print(f"Model: {interaction2.outputs[-1].text}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

// 1. First turn
const interaction1 = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Hi, my name is Phil.'
});
console.log(`Model: ${interaction1.outputs[interaction1.outputs.length - 1].text}`);

// 2. Second turn (passing previous_interaction_id)
const interaction2 = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'What is my name?',
    previous_interaction_id: interaction1.id
});
console.log(`Model: ${interaction2.outputs[interaction2.outputs.length - 1].text}`);

REST

# 1. First turn
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Hi, my name is Phil."
}'

# 2. Second turn (Replace INTERACTION_ID with the ID from the previous interaction)
# curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
# -H "Content-Type: application/json" \
# -H "x-goog-api-key: $GEMINI_API_KEY" \
# -d '{
#     "model": "gemini-3-flash-preview",
#     "input": "What is my name?",
#     "previous_interaction_id": "INTERACTION_ID"
# }'

擷取先前的有狀態互動

使用互動 id 擷取先前的對話輪次。

Python

previous_interaction = client.interactions.get("<YOUR_INTERACTION_ID>")

print(previous_interaction)

JavaScript

const previous_interaction = await client.interactions.get("<YOUR_INTERACTION_ID>");
console.log(previous_interaction);

REST

curl -X GET "https://generativelanguage.googleapis.com/v1beta/interactions/<YOUR_INTERACTION_ID>" \
-H "x-goog-api-key: $GEMINI_API_KEY"

包含原始輸入內容

根據預設，interactions.get() 只會傳回模型的輸出內容。如要在回應中加入原始標準化輸入內容，請將 include_input 設為 true。

Python

interaction = client.interactions.get(
    "<YOUR_INTERACTION_ID>",
    include_input=True
)

print(f"Input: {interaction.input}")
print(f"Output: {interaction.outputs}")

JavaScript

const interaction = await client.interactions.get(
    "<YOUR_INTERACTION_ID>",
    { include_input: true }
);

console.log(`Input: ${JSON.stringify(interaction.input)}`);
console.log(`Output: ${JSON.stringify(interaction.outputs)}`);

REST

curl -X GET "https://generativelanguage.googleapis.com/v1beta/interactions/<YOUR_INTERACTION_ID>?include_input=true" \
-H "x-goog-api-key: $GEMINI_API_KEY"

無狀態對話

您可以在用戶端手動管理對話記錄。

Python

from google import genai

client = genai.Client()

conversation_history = [
    {
        "role": "user",
        "content": "What are the three largest cities in Spain?"
    }
]

interaction1 = client.interactions.create(
    model="gemini-3-flash-preview",
    input=conversation_history
)

print(f"Model: {interaction1.outputs[-1].text}")

conversation_history.append({"role": "model", "content": interaction1.outputs})
conversation_history.append({
    "role": "user",
    "content": "What is the most famous landmark in the second one?"
})

interaction2 = client.interactions.create(
    model="gemini-3-flash-preview",
    input=conversation_history
)

print(f"Model: {interaction2.outputs[-1].text}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const conversationHistory = [
    {
        role: 'user',
        content: "What are the three largest cities in Spain?"
    }
];

const interaction1 = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: conversationHistory
});

console.log(`Model: ${interaction1.outputs[interaction1.outputs.length - 1].text}`);

conversationHistory.push({ role: 'model', content: interaction1.outputs });
conversationHistory.push({
    role: 'user',
    content: "What is the most famous landmark in the second one?"
});

const interaction2 = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: conversationHistory
});

console.log(`Model: ${interaction2.outputs[interaction2.outputs.length - 1].text}`);

REST

 curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
 -H "Content-Type: application/json" \
 -H "x-goog-api-key: $GEMINI_API_KEY" \
 -d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {
            "role": "user",
            "content": "What are the three largest cities in Spain?"
        },
        {
            "role": "model",
            "content": "The three largest cities in Spain are Madrid, Barcelona, and Valencia."
        },
        {
            "role": "user",
            "content": "What is the most famous landmark in the second one?"
        }
    ]
}'

多模態功能

您可以將 Interactions API 用於多模態用途，例如圖片理解或影片生成。

多模態理解

您可以內嵌 base64 編碼資料來提供多模態輸入內容，也可以使用 Files API 處理較大的檔案，或在 uri 欄位中傳遞可公開存取的連結。以下程式碼範例示範公開網址方法。

圖像解讀

Python

from google import genai
client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Describe the image."},
        {
            "type": "image",
            "uri": "YOUR_URL",
            "mime_type": "image/png"
        }
    ]
)
print(interaction.outputs[-1].text)

JavaScript

import {GoogleGenAI} from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        {type: 'text', text: 'Describe the image.'},
        {
            type: 'image',
            uri: 'YOUR_URL',
            mime_type: 'image/png'
        }
    ]
});
console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
    {
        "type": "text",
        "text": "Describe the image."
    },
    {
        "type": "image",
        "uri": "YOUR_URL",
        "mime_type": "image/png"
    }
    ]
}'

音訊理解

Python

from google import genai
client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "What does this audio say?"},
        {
            "type": "audio",
            "uri": "YOUR_URL",
            "mime_type": "audio/wav"
        }
    ]
)
print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        { type: 'text', text: 'What does this audio say?' },
        {
            type: 'audio',
            uri: 'YOUR_URL',
            mime_type: 'audio/wav'
        }
    ]
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {"type": "text", "text": "What does this audio say?"},
        {
            "type": "audio",
            "uri": "YOUR_URL",
            "mime_type": "audio/wav"
        }
    ]
}'

影片解讀

Python

from google import genai
client = genai.Client()

print("Analyzing video...")
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "What is happening in this video? Provide a timestamped summary."},
        {
            "type": "video",
            "uri": "YOUR_URL",
            "mime_type": "video/mp4"
        }
    ]
)

print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

console.log('Analyzing video...');
const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        { type: 'text', text: 'What is happening in this video? Provide a timestamped summary.' },
        {
            type: 'video',
            uri: 'YOUR_URL',
            mime_type: 'video/mp4'
        }
    ]
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {"type": "text", "text": "What is happening in this video?"},
        {
            "type": "video",
            "uri": "YOUR_URL",
            "mime_type": "video/mp4"
        }
    ]
}'

解讀文件 (PDF)

Python

from google import genai
client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "What is this document about?"},
        {
            "type": "document",
            "uri": "YOUR_URL",
            "mime_type": "application/pdf"
        }
    ]
)
print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        { type: 'text', text: 'What is this document about?' },
        {
            type: 'document',
            uri: 'YOUR_URL',
            mime_type: 'application/pdf'
        }
    ],
});
console.log(interaction.outputs[0].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {"type": "text", "text": "What is this document about?"},
        {
            "type": "document",
            "uri": "YOUR_URL",
            "mime_type": "application/pdf"
        }
    ]
}'

多模態生成

您可以使用 Interactions API 生成多模態輸出內容。

圖像生成

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-pro-image-preview",
    input="Generate an image of a futuristic city.",
    response_modalities=["IMAGE"]
)

for output in interaction.outputs:
    if output.type == "image":
        print(f"Generated image with mime_type: {output.mime_type}")
        # Save the image
        with open("generated_city.png", "wb") as f:
            f.write(base64.b64decode(output.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-pro-image-preview',
    input: 'Generate an image of a futuristic city.',
    response_modalities: ['IMAGE']
});

for (const output of interaction.outputs) {
    if (output.type === 'image') {
        console.log(`Generated image with mime_type: ${output.mime_type}`);
        // Save the image
        fs.writeFileSync('generated_city.png', Buffer.from(output.data, 'base64'));
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-pro-image-preview",
    "input": "Generate an image of a futuristic city.",
    "response_modalities": ["IMAGE"]
}'

設定圖片輸出

您可以使用 generation_config 內的 image_config 自訂生成的圖片，控制長寬比和解析度。

參數	選項	說明
`aspect_ratio`	`1:1`、`2:3`、`3:2`、`3:4`、`4:3`、`4:5`、`5:4`、`9:16`、`16:9`、`21:9`	控制輸出圖片的寬高比。
`image_size`	`1k`、`2k`、`4k`	設定輸出圖片解析度。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-pro-image-preview",
    input="Generate an image of a futuristic city.",
    generation_config={
        "image_config": {
            "aspect_ratio": "9:16",
            "image_size": "2k"
        }
    }
)

for output in interaction.outputs:
    if output.type == "image":
        print(f"Generated image with mime_type: {output.mime_type}")
        # Save the image
        with open("generated_city.png", "wb") as f:
            f.write(base64.b64decode(output.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-pro-image-preview',
    input: 'Generate an image of a futuristic city.',
    generation_config: {
        image_config: {
            aspect_ratio: '9:16',
            image_size: '2k'
        }
    }
});

for (const output of interaction.outputs) {
    if (output.type === 'image') {
        console.log(`Generated image with mime_type: ${output.mime_type}`);
        // Save the image
        fs.writeFileSync('generated_city.png', Buffer.from(output.data, 'base64'));
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-pro-image-preview",
    "input": "Generate an image of a futuristic city.",
    "generation_config": {
        "image_config": {
            "aspect_ratio": "9:16",
            "image_size": "2k"
        }
    }
}'

語音生成

使用文字轉語音 (TTS) 模型，根據文字生成自然流暢的語音內容。使用 speech_config
參數設定語音、語言和揚聲器。

Python

import base64
from google import genai
import wave

# Set up the wave file to save the output:
def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(sample_width)
        wf.setframerate(rate)
        wf.writeframes(pcm)

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-2.5-flash-preview-tts",
    input="Say the following: WOOHOO This is so much fun!.",
    response_modalities=["AUDIO"],
    generation_config={
        "speech_config": {
            "language": "en-us",
            "voice": "kore"
        }
    }
)

for output in interaction.outputs:
    if output.type == "audio":
        print(f"Generated audio with mime_type: {output.mime_type}")
        # Save the audio as wave file to the current directory.
        wave_file("generated_audio.wav", base64.b64decode(output.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
import wav from 'wav';

async function saveWaveFile(
    filename,
    pcmData,
    channels = 1,
    rate = 24000,
    sampleWidth = 2,
) {
    return new Promise((resolve, reject) => {
        const writer = new wav.FileWriter(filename, {
                channels,
                sampleRate: rate,
                bitDepth: sampleWidth * 8,
        });

        writer.on('finish', resolve);
        writer.on('error', reject);

        writer.write(pcmData);
        writer.end();
    });
}

async function main() {
    const GEMINI_API_KEY = process.env.GEMINI_API_KEY;
    const client = new GoogleGenAI({apiKey: GEMINI_API_KEY});

    const interaction = await client.interactions.create({
        model: 'gemini-2.5-flash-preview-tts',
        input: 'Say the following: WOOHOO This is so much fun!.',
        response_modalities: ['AUDIO'],
        generation_config: {
            speech_config: {
                language: "en-us",
                voice: "kore"
            }
        }
    });

    for (const output of interaction.outputs) {
        if (output.type === 'audio') {
            console.log(`Generated audio with mime_type: ${output.mime_type}`);
            const audioBuffer = Buffer.from(output.data, 'base64');
            // Save the audio as wave file to the current directory
            await saveWaveFile("generated_audio.wav", audioBuffer);
        }
    }
}
await main();

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Say the following: WOOHOO This is so much fun!.",
    "response_modalities": ["AUDIO"],
    "generation_config": {
        "speech_config": {
            "language": "en-us",
            "voice": "kore"
        }
    }
}' | jq -r '.outputs[] | select(.type == "audio") | .data' | base64 -d > generated_audio.pcm
# You may need to install ffmpeg.
ffmpeg -f s16le -ar 24000 -ac 1 -i generated_audio.pcm generated_audio.wav

生成多位說話者的語音

在提示中指定說話者名稱，並在 speech_config 中比對，即可生成多位說話者的語音。

提示應包含說話者姓名：

TTS the following conversation between Alice and Bob:
Alice: Hi Bob, how are you doing today?
Bob: I'm doing great, thanks for asking! How about you?
Alice: Fantastic! I just learned about the Gemini API.

然後使用相符的音箱設定 speech_config：

"generation_config": {
    "speech_config": [
        {"voice": "Zephyr", "speaker": "Alice", "language": "en-US"},
        {"voice": "Puck", "speaker": "Bob", "language": "en-US"}
    ]
}

代理能力

Interactions API 專為建構及與代理互動而設計，支援函式呼叫、內建工具、結構化輸出內容和 Model Context Protocol (MCP)。

代理

您可以針對複雜工作使用 deep-research-pro-preview-12-2025 等專用代理程式。如要進一步瞭解 Gemini Deep Research 代理程式，請參閱「Deep Research」指南。

Python

import time
from google import genai

client = genai.Client()

# 1. Start the Deep Research Agent
initial_interaction = client.interactions.create(
    input="Research the history of the Google TPUs with a focus on 2025 and 2026.",
    agent="deep-research-pro-preview-12-2025",
    background=True
)

print(f"Research started. Interaction ID: {initial_interaction.id}")

# 2. Poll for results
while True:
    interaction = client.interactions.get(initial_interaction.id)
    print(f"Status: {interaction.status}")

    if interaction.status == "completed":
        print("\nFinal Report:\n", interaction.outputs[-1].text)
        break
    elif interaction.status in ["failed", "cancelled"]:
        print(f"Failed with status: {interaction.status}")
        break

    time.sleep(10)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

// 1. Start the Deep Research Agent
const initialInteraction = await client.interactions.create({
    input: 'Research the history of the Google TPUs with a focus on 2025 and 2026.',
    agent: 'deep-research-pro-preview-12-2025',
    background: true
});

console.log(`Research started. Interaction ID: ${initialInteraction.id}`);

// 2. Poll for results
while (true) {
    const interaction = await client.interactions.get(initialInteraction.id);
    console.log(`Status: ${interaction.status}`);

    if (interaction.status === 'completed') {
        console.log('\nFinal Report:\n', interaction.outputs[interaction.outputs.length - 1].text);
        break;
    } else if (['failed', 'cancelled'].includes(interaction.status)) {
        console.log(`Failed with status: ${interaction.status}`);
        break;
    }

    await new Promise(resolve => setTimeout(resolve, 10000));
}

REST

# 1. Start the Deep Research Agent
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "input": "Research the history of the Google TPUs with a focus on 2025 and 2026.",
    "agent": "deep-research-pro-preview-12-2025",
    "background": true
}'

# 2. Poll for results (Replace INTERACTION_ID with the ID from the previous interaction)
# curl -X GET "https://generativelanguage.googleapis.com/v1beta/interactions/INTERACTION_ID" \
# -H "x-goog-api-key: $GEMINI_API_KEY"

工具和函式呼叫

本節說明如何使用函式呼叫定義自訂工具，以及如何在 Interactions API 中使用 Google 內建工具。

函式呼叫

Python

from google import genai

client = genai.Client()

# 1. Define the tool
def get_weather(location: str):
    """Gets the weather for a given location."""
    return f"The weather in {location} is sunny."

weather_tool = {
    "type": "function",
    "name": "get_weather",
    "description": "Gets the weather for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}
        },
        "required": ["location"]
    }
}

# 2. Send the request with tools
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What is the weather in Paris?",
    tools=[weather_tool]
)

# 3. Handle the tool call
for output in interaction.outputs:
    if output.type == "function_call":
        print(f"Tool Call: {output.name}({output.arguments})")
        # Execute tool
        result = get_weather(**output.arguments)

        # Send result back
        interaction = client.interactions.create(
            model="gemini-3-flash-preview",
            previous_interaction_id=interaction.id,
            input=[{
                "type": "function_result",
                "name": output.name,
                "call_id": output.id,
                "result": result
            }]
        )
        print(f"Response: {interaction.outputs[-1].text}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

// 1. Define the tool
const weatherTool = {
    type: 'function',
    name: 'get_weather',
    description: 'Gets the weather for a given location.',
    parameters: {
        type: 'object',
        properties: {
            location: { type: 'string', description: 'The city and state, e.g. San Francisco, CA' }
        },
        required: ['location']
    }
};

// 2. Send the request with tools
let interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'What is the weather in Paris?',
    tools: [weatherTool]
});

// 3. Handle the tool call
for (const output of interaction.outputs) {
    if (output.type === 'function_call') {
        console.log(`Tool Call: ${output.name}(${JSON.stringify(output.arguments)})`);

        // Execute tool (Mocked)
        const result = `The weather in ${output.arguments.location} is sunny.`;

        // Send result back
        interaction = await client.interactions.create({
            model: 'gemini-3-flash-preview',
            previous_interaction_id:interaction.id,
            input: [{
                type: 'function_result',
                name: output.name,
                call_id: output.id,
                result: result
            }]
        });
        console.log(`Response: ${interaction.outputs[interaction.outputs.length - 1].text}`);
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "What is the weather in Paris?",
    "tools": [{
        "type": "function",
        "name": "get_weather",
        "description": "Gets the weather for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}
            },
            "required": ["location"]
        }
    }]
}'

# Handle the tool call and send result back (Replace INTERACTION_ID and CALL_ID)
# curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
# -H "Content-Type: application/json" \
# -H "x-goog-api-key: $GEMINI_API_KEY" \
# -d '{
#     "model": "gemini-3-flash-preview",
#     "previous_interaction_id": "INTERACTION_ID",
#     "input": [{
#         "type": "function_result",
#         "name": "get_weather",
#         "call_id": "FUNCTION_CALL_ID",
#         "result": "The weather in Paris is sunny."
#     }]
# }'

使用用戶端狀態呼叫函式

如果不想使用伺服器端狀態，可以在用戶端管理所有狀態。

Python

from google import genai
client = genai.Client()

functions = [
    {
        "type": "function",
        "name": "schedule_meeting",
        "description": "Schedules a meeting with specified attendees at a given time and date.",
        "parameters": {
            "type": "object",
            "properties": {
                "attendees": {"type": "array", "items": {"type": "string"}},
                "date": {"type": "string", "description": "Date of the meeting (e.g., 2024-07-29)"},
                "time": {"type": "string", "description": "Time of the meeting (e.g., 15:00)"},
                "topic": {"type": "string", "description": "The subject of the meeting."},
            },
            "required": ["attendees", "date", "time", "topic"],
        },
    }
]

history = [{"role": "user","content": [{"type": "text", "text": "Schedule a meeting for 2025-11-01 at 10 am with Peter and Amir about the Next Gen API."}]}]

# 1. Model decides to call the function
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=history,
    tools=functions
)

# add model interaction back to history
history.append({"role": "model", "content": interaction.outputs})

for output in interaction.outputs:
    if output.type == "function_call":
        print(f"Function call: {output.name} with arguments {output.arguments}")

        # 2. Execute the function and get a result
        # In a real app, you would call your function here.
        # call_result = schedule_meeting(**json.loads(output.arguments))
        call_result = "Meeting scheduled successfully."

        # 3. Send the result back to the model
        history.append({"role": "user", "content": [{"type": "function_result", "name": output.name, "call_id": output.id, "result": call_result}]})

        interaction2 = client.interactions.create(
            model="gemini-3-flash-preview",
            input=history,
        )
        print(f"Final response: {interaction2.outputs[-1].text}")
    else:
        print(f"Output: {output}")

JavaScript

// 1. Define the tool
const functions = [
    {
        type: 'function',
        name: 'schedule_meeting',
        description: 'Schedules a meeting with specified attendees at a given time and date.',
        parameters: {
            type: 'object',
            properties: {
                attendees: { type: 'array', items: { type: 'string' } },
                date: { type: 'string', description: 'Date of the meeting (e.g., 2024-07-29)' },
                time: { type: 'string', description: 'Time of the meeting (e.g., 15:00)' },
                topic: { type: 'string', description: 'The subject of the meeting.' },
            },
            required: ['attendees', 'date', 'time', 'topic'],
        },
    },
];

const history = [
    { role: 'user', content: [{ type: 'text', text: 'Schedule a meeting for 2025-11-01 at 10 am with Peter and Amir about the Next Gen API.' }] }
];

// 2. Model decides to call the function
let interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: history,
    tools: functions
});

// add model interaction back to history
history.push({ role: 'model', content: interaction.outputs });

for (const output of interaction.outputs) {
    if (output.type === 'function_call') {
        console.log(`Function call: ${output.name} with arguments ${JSON.stringify(output.arguments)}`);

        // 3. Send the result back to the model
        history.push({ role: 'user', content: [{ type: 'function_result', name: output.name, call_id: output.id, result: 'Meeting scheduled successfully.' }] });

        const interaction2 = await client.interactions.create({
            model: 'gemini-3-flash-preview',
            input: history,
        });
        console.log(`Final response: ${interaction2.outputs[interaction2.outputs.length - 1].text}`);
    }
}

多模態函式結果

function_result 中的 result 欄位可接受純字串，或 TextContent 和 ImageContent 物件的陣列。這樣一來，您就能在函式呼叫中傳回文字，以及螢幕截圖或圖表等圖片，讓模型根據視覺輸出內容進行推理。

Python

import base64
from google import genai

client = genai.Client()

functions = [
    {
        "type": "function",
        "name": "take_screenshot",
        "description": "Takes a screenshot of a specified website.",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "The URL to take a screenshot of."},
            },
            "required": ["url"],
        },
    }
]

# 1. Model decides to call the function
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Can you take a screenshot of https://google.com and tell me what you see?",
    tools=functions
)

for output in interaction.outputs:
    if output.type == "function_call":
        print(f"Function call: {output.name}({output.arguments})")

        # 2. Execute the function and load the image
        # Replace with actual function call, pseudo code for reading image from disk
        with open("screenshot.png", "rb") as f:
            base64_image = base64.b64encode(f.read()).decode("utf-8")

        # 3. Return a multimodal result (text + image)
        call_result = [
            {"type": "text", "text": "Screenshot captured successfully."},
            {"type": "image", "mime_type": "image/png", "data": base64_image}
        ]

        response = client.interactions.create(
            model="gemini-3-flash-preview",
            tools=functions,
            previous_interaction_id=interaction.id,
            input=[{
                "type": "function_result",
                "name": output.name,
                "call_id": output.id,
                "result": call_result
            }]
        )
        print(f"Response: {response.outputs[-1].text}")

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';

const client = new GoogleGenAI({});

const functions = [
    {
        type: 'function',
        name: 'take_screenshot',
        description: 'Takes a screenshot of a specified website.',
        parameters: {
            type: 'object',
            properties: {
                url: { type: 'string', description: 'The URL to take a screenshot of.' },
            },
            required: ['url'],
        },
    }
];

// 1. Model decides to call the function
let interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Can you take a screenshot of https://google.com and tell me what you see?',
    tools: functions
});

for (const output of interaction.outputs) {
    if (output.type === 'function_call') {
        console.log(`Function call: ${output.name}(${JSON.stringify(output.arguments)})`);

        // 2. Execute the function and load the image
        // Replace with actual function call, pseudo code for reading image from disk
        const base64Image = fs.readFileSync('screenshot.png').toString('base64');

        // 3. Return a multimodal result (text + image)
        const callResult = [
            { type: 'text', text: 'Screenshot captured successfully.' },
            { type: 'image', mime_type: 'image/png', data: base64Image }
        ];

        const response = await client.interactions.create({
            model: 'gemini-3-flash-preview',
            tools: functions,
            previous_interaction_id: interaction.id,
            input: [{
                type: 'function_result',
                name: output.name,
                call_id: output.id,
                result: callResult
            }]
        });
        console.log(`Response: ${response.outputs[response.outputs.length - 1].text}`);
    }
}

REST

# 1. Send request with tools (will return a function_call)
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Can you take a screenshot of https://google.com and tell me what you see?",
    "tools": [{
        "type": "function",
        "name": "take_screenshot",
        "description": "Takes a screenshot of a specified website.",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "The URL to take a screenshot of."}
            },
            "required": ["url"]
        }
    }]
}'

# 2. Send multimodal result back (Replace INTERACTION_ID, CALL_ID, and BASE64_IMAGE)
# curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
# -H "Content-Type: application/json" \
# -H "x-goog-api-key: $GEMINI_API_KEY" \
# -d '{
#     "model": "gemini-3-flash-preview",
#     "tools": [{"type": "function", "name": "take_screenshot", "description": "Takes a screenshot of a specified website.", "parameters": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}}],
#     "previous_interaction_id": "INTERACTION_ID",
#     "input": [{
#         "type": "function_result",
#         "name": "take_screenshot",
#         "call_id": "CALL_ID",
#         "result": [
#             {"type": "text", "text": "Screenshot captured successfully."},
#             {"type": "image", "mime_type": "image/png", "data": "BASE64_IMAGE"}
#         ]
#     }]
# }'

內建工具

Gemini 內建多種工具，例如：透過 Google 搜尋建立基準、程式碼執行、網址內容和電腦使用

以 Google 搜尋建立基準

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Who won the last Super Bowl?",
    tools=[{"type": "google_search"}]
)
# Find the text output (not the GoogleSearchResultContent)
text_output = next((o for o in interaction.outputs if o.type == "text"), None)
if text_output:
    print(text_output.text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Who won the last Super Bowl?',
    tools: [{ type: 'google_search' }]
});
// Find the text output (not the GoogleSearchResultContent)
const textOutput = interaction.outputs.find(o => o.type === 'text');
if (textOutput) console.log(textOutput.text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Who won the last Super Bowl?",
    "tools": [{"type": "google_search"}]
}'

程式碼執行

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Calculate the 50th Fibonacci number.",
    tools=[{"type": "code_execution"}]
)
print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Calculate the 50th Fibonacci number.',
    tools: [{ type: 'code_execution' }]
});
console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Calculate the 50th Fibonacci number.",
    "tools": [{"type": "code_execution"}]
}'

網址背景資訊

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Summarize the content of https://www.wikipedia.org/",
    tools=[{"type": "url_context"}]
)
# Find the text output (not the URLContextResultContent)
text_output = next((o for o in interaction.outputs if o.type == "text"), None)
if text_output:
    print(text_output.text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Summarize the content of https://www.wikipedia.org/',
    tools: [{ type: 'url_context' }]
});
// Find the text output (not the URLContextResultContent)
const textOutput = interaction.outputs.find(o => o.type === 'text');
if (textOutput) console.log(textOutput.text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Summarize the content of https://www.wikipedia.org/",
    "tools": [{"type": "url_context"}]
}'

操作電腦

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-2.5-computer-use-preview-10-2025",
    input="Search for highly rated smart fridges with touchscreen, 2 doors, around 25 cu ft, priced below 4000 dollars on Google Shopping. Create a bulleted list of the 3 cheapest options in the format of name, description, price in an easy-to-read layout.",
    tools=[{
        "type": "computer_use",
        "environment": "browser",
        "excludedPredefinedFunctions": ["drag_and_drop"]
    }]
)

# The response will contain tool calls (actions) for the computer interface
# or text explaining the action
for output in interaction.outputs:
    print(output)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-2.5-computer-use-preview-10-2025',
    input: 'Search for highly rated smart fridges with touchscreen, 2 doors, around 25 cu ft, priced below 4000 dollars on Google Shopping. Create a bulleted list of the 3 cheapest options in the format of name, description, price in an easy-to-read layout.',
    tools: [{
        type: 'computer_use',
        environment: 'browser',
        excludedPredefinedFunctions: ['drag_and_drop']
    }]
});

// The response will contain tool calls (actions) for the computer interface
// or text explaining the action
interaction.outputs.forEach(output => console.log(output));

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-2.5-computer-use-preview-10-2025",
    "input": "Search for highly rated smart fridges with touchscreen, 2 doors, around 25 cu ft, priced below 4000 dollars on Google Shopping. Create a bulleted list of the 3 cheapest options in the format of name, description, price in an easy-to-read layout.",
    "tools": [{
        "type": "computer_use",
        "environment": "browser",
        "excludedPredefinedFunctions": ["drag_and_drop"]
    }]
}'

遠端模型內容通訊協定 (MCP)

遠端 MCP 整合功能可讓 Gemini API 直接呼叫遠端伺服器上託管的外部工具，簡化代理程式開發作業。

Python

import datetime
from google import genai

client = genai.Client()

mcp_server = {
    "type": "mcp_server",
    "name": "weather_service",
    "url": "https://gemini-api-demos.uc.r.appspot.com/mcp"
}

today = datetime.date.today().strftime("%d %B %Y")

interaction = client.interactions.create(
    model="gemini-2.5-flash",
    input="What is the weather like in New York today?",
    tools=[mcp_server],
    system_instruction=f"Today is {today}."
)

print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const mcpServer = {
    type: 'mcp_server',
    name: 'weather_service',
    url: 'https://gemini-api-demos.uc.r.appspot.com/mcp'
};

const today = new Date().toDateString();

const interaction = await client.interactions.create({
    model: 'gemini-2.5-flash',
    input: 'What is the weather like in New York today?',
    tools: [mcpServer],
    system_instruction: `Today is ${today}.`
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-2.5-flash",
    "input": "What is the weather like in New York today?",
    "tools": [{
        "type": "mcp_server",
        "name": "weather_service",
        "url": "https://gemini-api-demos.uc.r.appspot.com/mcp"
    }],
    "system_instruction": "Today is '"$(date +"%du%Bt%Y")"' YYYY-MM-DD>."
}'

重要注意事項：

遠端 MCP 僅適用於可串流的 HTTP 伺服器 (不支援 SSE 伺服器)
遠端 MCP 無法搭配 Gemini 3 模型使用 (即將推出)
MCP 伺服器名稱不得包含「-」字元 (請改用 snake_case 伺服器名稱)

結構化輸出內容 (JSON 結構定義)

在 response_format 參數中提供 JSON 結構定義，強制模型以特定 JSON 格式輸出內容。這項功能適用於內容審查、分類或資料擷取等工作。

Python

from google import genai
from pydantic import BaseModel, Field
from typing import Literal, Union
client = genai.Client()

class SpamDetails(BaseModel):
    reason: str = Field(description="The reason why the content is considered spam.")
    spam_type: Literal["phishing", "scam", "unsolicited promotion", "other"]

class NotSpamDetails(BaseModel):
    summary: str = Field(description="A brief summary of the content.")
    is_safe: bool = Field(description="Whether the content is safe for all audiences.")

class ModerationResult(BaseModel):
    decision: Union[SpamDetails, NotSpamDetails]

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Moderate the following content: 'Congratulations! You've won a free cruise. Click here to claim your prize: www.definitely-not-a-scam.com'",
    response_format=ModerationResult.model_json_schema(),
)

parsed_output = ModerationResult.model_validate_json(interaction.outputs[-1].text)
print(parsed_output)

JavaScript

import { GoogleGenAI } from '@google/genai';
import { z } from 'zod';
const client = new GoogleGenAI({});

const moderationSchema = z.object({
    decision: z.union([
        z.object({
            reason: z.string().describe('The reason why the content is considered spam.'),
            spam_type: z.enum(['phishing', 'scam', 'unsolicited promotion', 'other']).describe('The type of spam.'),
        }).describe('Details for content classified as spam.'),
        z.object({
            summary: z.string().describe('A brief summary of the content.'),
            is_safe: z.boolean().describe('Whether the content is safe for all audiences.'),
        }).describe('Details for content classified as not spam.'),
    ]),
});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: "Moderate the following content: 'Congratulations! You've won a free cruise. Click here to claim your prize: www.definitely-not-a-scam.com'",
    response_format: z.toJSONSchema(moderationSchema),
});
console.log(interaction.outputs[0].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Moderate the following content: 'Congratulations! You've won a free cruise. Click here to claim your prize: www.definitely-not-a-scam.com'",
    "response_format": {
        "type": "object",
        "properties": {
            "decision": {
                "type": "object",
                "properties": {
                    "reason": {"type": "string", "description": "The reason why the content is considered spam."},
                    "spam_type": {"type": "string", "description": "The type of spam."}
                },
                "required": ["reason", "spam_type"]
            }
        },
        "required": ["decision"]
    }
}'

結合工具和結構化輸出內容

結合內建工具和結構化輸出內容，根據工具擷取的資訊取得可靠的 JSON 物件。

Python

from google import genai
from pydantic import BaseModel, Field
from typing import Literal, Union

client = genai.Client()

class SpamDetails(BaseModel):
    reason: str = Field(description="The reason why the content is considered spam.")
    spam_type: Literal["phishing", "scam", "unsolicited promotion", "other"]

class NotSpamDetails(BaseModel):
    summary: str = Field(description="A brief summary of the content.")
    is_safe: bool = Field(description="Whether the content is safe for all audiences.")

class ModerationResult(BaseModel):
    decision: Union[SpamDetails, NotSpamDetails]

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Moderate the following content: 'Congratulations! You've won a free cruise. Click here to claim your prize: www.definitely-not-a-scam.com'",
    response_format=ModerationResult.model_json_schema(),
    tools=[{"type": "url_context"}]
)

parsed_output = ModerationResult.model_validate_json(interaction.outputs[-1].text)
print(parsed_output)

JavaScript

import { GoogleGenAI } from '@google/genai';
import { z } from 'zod'; // Assuming zod is used for schema generation, or define manually
const client = new GoogleGenAI({});

const obj = z.object({
    winning_team: z.string(),
    score: z.string(),
});
const schema = z.toJSONSchema(obj);

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Who won the last euro?',
    tools: [{ type: 'google_search' }],
    response_format: schema,
});
console.log(interaction.outputs[0].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Who won the last euro?",
    "tools": [{"type": "google_search"}],
    "response_format": {
        "type": "object",
        "properties": {
            "winning_team": {"type": "string"},
            "score": {"type": "string"}
        }
    }
}'

進階功能

此外，還有其他進階功能，可讓您更彈性地使用 Interactions API。

串流

在生成回覆的同時接收回覆。

Python

from google import genai

client = genai.Client()

stream = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Explain quantum entanglement in simple terms.",
    stream=True
)

for chunk in stream:
    if chunk.event_type == "content.delta":
        if chunk.delta.type == "text":
            print(chunk.delta.text, end="", flush=True)
        elif chunk.delta.type == "thought":
            print(chunk.delta.thought, end="", flush=True)
    elif chunk.event_type == "interaction.complete":
        print(f"\n\n--- Stream Finished ---")
        print(f"Total Tokens: {chunk.interaction.usage.total_tokens}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const stream = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Explain quantum entanglement in simple terms.',
    stream: true,
});

for await (const chunk of stream) {
    if (chunk.event_type === 'content.delta') {
        if (chunk.delta.type === 'text' && 'text' in chunk.delta) {
            process.stdout.write(chunk.delta.text);
        } else if (chunk.delta.type === 'thought' && 'thought' in chunk.delta) {
            process.stdout.write(chunk.delta.thought);
        }
    } else if (chunk.event_type === 'interaction.complete') {
        console.log('\n\n--- Stream Finished ---');
        console.log(`Total Tokens: ${chunk.interaction.usage.total_tokens}`);
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?alt=sse" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Explain quantum entanglement in simple terms.",
    "stream": true
}'

串流事件類型

啟用串流功能後，API 會傳回伺服器傳送事件 (SSE)。每個事件都有 event_type 欄位，指出事件用途。如需事件類型的完整清單，請參閱 API 參考資料。

事件類型	說明
`interaction.start`	第一個活動。包含互動 `id` 和初始 `status` (`in_progress`)。
`interaction.status_update`	表示狀態變更 (例如 `in_progress`)。
`content.start`	標示新輸出區塊的開頭。包含 `index` 和內容 `type` (例如`text`、`thought`)。
`content.delta`	分批更新內容。包含以 `delta.type` 為鍵的部分資料。
`content.stop`	在 `index` 標記輸出區塊的結尾。
`interaction.complete`	最終事件。包含 `id`、`status`、`usage` 和中繼資料。注意： `outputs` 是 `None`，您必須從 `content.*` 事件重建輸出內容。
`error`	表示發生錯誤。包含 `error.code` 和 `error.message`。

從串流事件重建 Interaction 物件

與非逐句顯示回覆不同，逐句顯示回覆不會包含 outputs 陣列。您必須透過累積 content.delta 事件的內容來重建輸出內容。

Python

from google import genai

client = genai.Client()

stream = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Write a haiku about Python programming.",
    stream=True
)

# Accumulate outputs by index
outputs = {}
usage = None

for chunk in stream:
    if chunk.event_type == "content.start":
        outputs[chunk.index] = {"type": chunk.content.type}

    elif chunk.event_type == "content.delta":
        output = outputs[chunk.index]
        if chunk.delta.type == "text":
            output["text"] = output.get("text", "") + chunk.delta.text
        elif chunk.delta.type == "thought_signature":
            output["signature"] = chunk.delta.signature
        elif chunk.delta.type == "thought_summary":
            output["summary"] = output.get("summary", "") + getattr(chunk.delta.content, "text", "")

    elif chunk.event_type == "interaction.complete":
        usage = chunk.interaction.usage

# Final outputs list (sorted by index)
final_outputs = [outputs[i] for i in sorted(outputs.keys())]
print(f"\n\nOutputs: {final_outputs}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const stream = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Write a haiku about Python programming.',
    stream: true,
});

// Accumulate outputs by index
const outputs = new Map();
let usage = null;

for await (const chunk of stream) {
    if (chunk.event_type === 'content.start') {
        outputs.set(chunk.index, { type: chunk.content.type });

    } else if (chunk.event_type === 'content.delta') {
        const output = outputs.get(chunk.index);
        if (chunk.delta.type === 'text') {
            output.text = (output.text || '') + chunk.delta.text;
            process.stdout.write(chunk.delta.text);
        } else if (chunk.delta.type === 'thought_signature') {
            output.signature = chunk.delta.signature;
        } else if (chunk.delta.type === 'thought_summary') {
            output.summary = (output.summary || '') + (chunk.delta.content?.text || '');
        }

    } else if (chunk.event_type === 'interaction.complete') {
        usage = chunk.interaction.usage;
    }
}

// Final outputs list (sorted by index)
const finalOutputs = [...outputs.entries()]
    .sort((a, b) => a[0] - b[0])
    .map(([_, output]) => output);
console.log(`\n\nOutputs:`, finalOutputs);

設定

使用 generation_config 自訂模型行為。

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Tell me a story about a brave knight.",
    generation_config={
        "temperature": 0.7,
        "max_output_tokens": 500,
        "thinking_level": "low",
    }
)

print(interaction.outputs[-1].text)

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Tell me a story about a brave knight.',
    generation_config: {
        temperature: 0.7,
        max_output_tokens: 500,
        thinking_level: 'low',
    }
});

console.log(interaction.outputs[interaction.outputs.length - 1].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Tell me a story about a brave knight.",
    "generation_config": {
        "temperature": 0.7,
        "max_output_tokens": 500,
        "thinking_level": "low"
    }
}'

思考中

Gemini 2.5 和更新的模型會在生成回覆前，先進行稱為「思考」的內部推理程序。這有助於模型針對數學、程式設計和多步驟推理等複雜工作，生成更優質的答案。

思考程度

thinking_level 參數可讓您控制模型的推理深度：

等級	說明	支援的機型
`minimal`	針對大多數查詢，比照「不思考」設定。在某些情況下，模型可能只會進行極少的思考。盡量縮短延遲時間並降低成本。	僅限 Flash 模型 (例如 Gemini 3 Flash)
`low`	輕量推理，優先考量延遲時間和節省成本，適用於簡單的指令遵循和對話。	所有思考模型
`medium`	適合用於大多數工作。	僅限 Flash 模型 (例如 Gemini 3 Flash)
`high`	(預設)：盡可能深入推理。模型可能需要較長時間才能產生第一個權杖，但輸出內容會經過更仔細的推論。	所有思考模型

想法摘要

模型思考過程會以思考區塊 (type: "thought") 的形式顯示在回覆輸出內容中。您可以使用 thinking_summaries 參數，控制是否要接收人類可解讀的思考過程摘要：

值	說明
`auto`	(預設)：在適用的情況下，傳回想法摘要。
`none`	停用想法摘要。

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="Solve this step by step: What is 15% of 240?",
    generation_config={
        "thinking_level": "high",
        "thinking_summaries": "auto"
    }
)

for output in interaction.outputs:
    if output.type == "thought":
        print(f"Thinking: {output.summary}")
    elif output.type == "text":
        print(f"Answer: {output.text}")

JavaScript

import { GoogleGenAI } from '@google/genai';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: 'Solve this step by step: What is 15% of 240?',
    generation_config: {
        thinking_level: 'high',
        thinking_summaries: 'auto'
    }
});

for (const output of interaction.outputs) {
    if (output.type === 'thought') {
        console.log(`Thinking: ${output.summary}`);
    } else if (output.type === 'text') {
        console.log(`Answer: ${output.text}`);
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": "Solve this step by step: What is 15% of 240?",
    "generation_config": {
        "thinking_level": "high",
        "thinking_summaries": "auto"
    }
}'

每個想法區塊都包含 signature 欄位 (內部推理狀態的加密雜湊) 和選用的 summary 欄位 (模型推理過程的摘要，方便使用者閱讀)。signature 一律會顯示，但如果出現下列情況，思想方塊可能只會包含簽名，沒有摘要：

簡單要求：模型未進行充分的推理，因此無法生成摘要
thinking_summaries: "none"：摘要功能已明確停用

程式碼應一律處理 summary 為空或不存在的思維區塊。手動管理對話記錄 (無狀態模式) 時，您必須在後續要求中加入附有簽章的思維區塊，以驗證真實性。

處理檔案

使用遠端檔案

直接在 API 呼叫中使用遠端網址存取檔案。

Python

from google import genai
client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {
            "type": "image",
            "uri": "https://github.com/<github-path>/cats-and-dogs.jpg",
        },
        {"type": "text", "text": "Describe what you see."}
    ],
)
for output in interaction.outputs:
    if output.type == "text":
        print(output.text)

JavaScript

import { GoogleGenAI } from '@google/genai';
const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        {
            type: 'image',
            uri: 'https://github.com/<github-path>/cats-and-dogs.jpg',
        },
        { type: 'text', text: 'Describe what you see.' }
    ],
});
for (const output of interaction.outputs) {
    if (output.type === 'text') {
        console.log(output.text);
    }
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {
            "type": "image",
            "uri": "https://github.com/<github-path>/cats-and-dogs.jpg"
        },
        {"type": "text", "text": "Describe what you see."}
    ]
}'

使用 Gemini Files API

請先將檔案上傳至 Gemini Files API，再使用這些檔案。

Python

from google import genai
import time
import requests
client = genai.Client()

# 1. Download the file
url = "https://github.com/philschmid/gemini-samples/raw/refs/heads/main/assets/cats-and-dogs.jpg"
response = requests.get(url)
with open("cats-and-dogs.jpg", "wb") as f:
    f.write(response.content)

# 2. Upload to Gemini Files API
file = client.files.upload(file="cats-and-dogs.jpg")

# 3. Wait for processing
while client.files.get(name=file.name).state != "ACTIVE":
    time.sleep(2)

# 4. Use in Interaction
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {
            "type": "image",
            "uri": file.uri,
        },
        {"type": "text", "text": "Describe what you see."}
    ],
)
for output in interaction.outputs:
    if output.type == "text":
        print(output.text)

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
import fetch from 'node-fetch';
const client = new GoogleGenAI({});

// 1. Download the file
const url = 'https://github.com/philschmid/gemini-samples/raw/refs/heads/main/assets/cats-and-dogs.jpg';
const filename = 'cats-and-dogs.jpg';
const response = await fetch(url);
const buffer = await response.buffer();
fs.writeFileSync(filename, buffer);

// 2. Upload to Gemini Files API
const myfile = await client.files.upload({ file: filename, config: { mimeType: 'image/jpeg' } });

// 3. Wait for processing
while ((await client.files.get({ name: myfile.name })).state !== 'ACTIVE') {
    await new Promise(resolve => setTimeout(resolve, 2000));
}

// 4. Use in Interaction
const interaction = await client.interactions.create({
    model: 'gemini-3-flash-preview',
    input: [
        { type: 'image', uri: myfile.uri, },
        { type: 'text', text: 'Describe what you see.' }
    ],
});
for (const output of interaction.outputs) {
    if (output.type === 'text') {
        console.log(output.text);
    }
}

REST

# 1. Upload the file (Requires File API setup)
# See https://ai.google.dev/gemini-api/docs/files for details.
# Assume FILE_URI is obtained from the upload step.

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "gemini-3-flash-preview",
    "input": [
        {"type": "image", "uri": "FILE_URI"},
        {"type": "text", "text": "Describe what you see."}
    ]
}'

資料模型

如要進一步瞭解資料模型，請參閱 API 參考資料。以下是主要元件的概略總覽。

互動

屬性	類型	說明
`id`	`string`	互動的專屬 ID。
`model`/`agent`	`string`	使用的模型或代理程式。只能提供一個。
`input`	`Content[]`	提供的輸入內容。
`outputs`	`Content[]`	模型的回覆。
`tools`	`Tool[]`	使用的工具。
`previous_interaction_id`	`string`	先前互動的 ID，用於提供脈絡資訊。
`stream`	`boolean`	互動是否為串流。
`status`	`string`	狀態：`completed`、`in_progress`、`requires_action`、`failed` 等。
`background`	`boolean`	互動是否處於背景模式。
`store`	`boolean`	是否要儲存互動內容。預設值：`true`。如要停用，請設為 `false`。
`usage`	用量	互動要求的權杖用量。

支援的模型和代理程式

模型名稱	類型	模型 ID
Gemini 2.5 Pro	型號	`gemini-2.5-pro`
Gemini 2.5 Flash	型號	`gemini-2.5-flash`
Gemini 2.5 Flash-lite	型號	`gemini-2.5-flash-lite`
Gemini 3 Pro 預先發布版	型號	`gemini-3-pro-preview`
Gemini 3 Flash 預先發布版	型號	`gemini-3-flash-preview`
Deep Research 搶先版	代理	`deep-research-pro-preview-12-2025`

Interactions API 的運作方式

Interactions API 是以中央資源 Interaction 為基礎設計。Interaction 代表對話或工作中的完整回合。這項物件會做為工作階段記錄，包含互動的完整記錄，包括所有使用者輸入內容、模型想法、工具呼叫、工具結果和最終模型輸出內容。

呼叫 interactions.create 時，您會建立新的 Interaction 資源。

伺服器端狀態管理

您可以在後續呼叫中使用 previous_interaction_id 參數，藉此使用已完成互動的 id 繼續對話。伺服器會使用這個 ID 擷取對話記錄，因此您不必重新傳送整個對話記錄。

系統只會使用 previous_interaction_id 保留對話記錄 (輸入內容和輸出內容)。其他參數為互動範圍，且僅適用於目前產生的特定互動：

tools
system_instruction
generation_config (包括 thinking_level、temperature 等)

也就是說，如要套用這些參數，您必須在每次新互動中重新指定。這項伺服器端狀態管理功能為選用功能，您也可以在無狀態模式下運作，方法是在每個要求中傳送完整的對話記錄。

資料儲存與保留

根據預設，所有 Interaction 物件都會儲存 (store=true)，以簡化伺服器端狀態管理功能 (使用 previous_interaction_id)、背景執行 (使用 background=true) 和可觀測性的使用。

付費層級：互動記錄會保留 55 天。
免費方案：互動記錄會保留 1 天。

如果不希望發生這種情況，可以在要求中設定 store=false。這項控制項與狀態管理功能無關，您可以選擇不儲存任何互動。不過請注意，store=false 與 background=true 不相容，且會禁止在後續回合使用 previous_interaction_id。

您隨時可以使用 API 參考資料中的刪除方法，刪除儲存的互動記錄。只有在知道互動 ID 的情況下，才能刪除互動。

保留期限過後，系統會自動刪除資料。

系統會根據條款處理互動物件。

最佳做法

快取命中率：使用 previous_interaction_id 繼續對話時，系統可以更輕鬆地運用對話記錄的隱含快取，進而提升效能並降低費用。
混合互動：您可以在對話中彈性混合搭配代理程式和模型互動。舉例來說，您可以先使用 Deep Research 代理等專用代理收集初步資料，然後使用標準 Gemini 模型執行後續工作，例如摘要或重新格式化，並以 previous_interaction_id 連結這些步驟。

SDK

您可以使用最新版的 Google GenAI SDK，存取 Interactions API。

在 Python 中，這是 1.55.0 版本以上的 google-genai 套件。
在 JavaScript 中，這是 @google/genai 套件的 1.33.0 版本。

如要進一步瞭解如何安裝 SDK，請參閱「程式庫」頁面。

限制

Beta 版狀態：Interactions API 目前為 Beta 版/搶先體驗版。功能和結構定義可能會有所異動。
不支援的功能：下列功能目前不支援，但很快就會推出：
- 運用 Google 地圖建立基準
輸出內容排序：內建工具 (google_search 和 url_context) 的內容排序有時可能不正確，文字會顯示在工具執行和結果之前。我們已知這個問題，正在設法解決。
工具組合：目前尚不支援組合使用 MCP、函式呼叫和內建工具，但這項功能即將推出。
遠端 MCP：Gemini 3 不支援遠端 MCP，但這項功能即將推出。

破壞性變更

目前 Interactions API 處於早期 Beta 版階段。我們會根據實際使用情況和開發人員意見回饋，積極開發及改良 API 功能、資源結構定義和 SDK 介面。

因此可能會發生破壞性變更。更新內容可能包括：

輸入和輸出內容的結構定義。
SDK 方法簽章和物件結構。
特定功能行為。

如為正式版工作負載，請繼續使用標準的 generateContent API。這個路徑仍是穩定部署的建議做法，而且我們會持續積極開發及維護。

意見回饋

您的意見回饋對 Interactions API 的開發至關重要。歡迎前往 Google AI 開發人員產品討論社群論壇分享想法、回報錯誤或要求功能。

後續步驟

請試用 Interactions API 快速入門筆記本。
進一步瞭解 Gemini Deep Research Agent。