Gemini Deep Research 现已推出预览版，支持协作规划、可视化、MCP 等功能。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Gemini 3.5 Flash 的新功能

注意：您正在查看 generateContent。对于新项目，我们建议使用新的 Interactions API（Beta 版），该 API 专为代理工作流、后台任务和未来的 Gemini 功能而构建。

Gemini 3.5 Flash 已正式发布 (GA)，性能稳定，可大规模用于生产环境。作为我们最智能的 Flash 模型，它在智能体执行、编码和长期任务方面可大规模提供持续的领先性能。

本指南概述了 Gemini 3.5 Flash 的改进、API 变更和迁移指南。

新建模型

模型	模型 ID	说明
Gemini 3.5 Flash	`gemini-3.5-flash`	Google 旗下最智能的模型，可在智能体和编码任务中持续提供前沿性能。

Gemini 3.5 Flash 支持 100 万个 token 的上下文窗口、最多 65,000 个输出 token、思考功能，以及与 Gemini 3 Flash 相同的工具和平台功能集。目前不支持电脑使用。

如需了解完整规格，请参阅型号概览。如需了解价格，请参阅价格页面。

快速入门

本指南中的所有示例均使用 GenerateContent API。系统还支持 Interactions API；相同的配置选项和建议也适用于该 API。

Python

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain how parallel agentic execution works in three sentences.",
)
print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-3.5-flash",
    contents: "Explain how parallel agentic execution works in three sentences.",
  });
  console.log(response.text);
}

main();

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{
      "parts": [{"text": "Explain how parallel agentic execution works in three sentences."}]
    }]
  }'

行为变更

新的默认锻炼强度：`medium`

默认思考强度现在为 medium，之前在 Gemini 3 Flash 预览版中为 high。medium 在各种任务中都能取得非常好的效果，同时速度更快、成本更低。对于复杂问题，high 会鼓励模型进行更深入的思考。

工作量等级	适用情形
`minimal`	针对响应速度进行了优化。类似聊天的使用情形、快速的事实性回答、更简单的工具调用。
`low`	需要更低延迟和更少步骤的代码和智能体任务。还非常适合需要一定思考的分析和写作任务。
`medium`（默认）	大多数任务的最佳质量。建议用于复杂的代码和智能体用例。
`high`	最大限度地提高模型的思考能力和工具使用能力。最适合处理复杂的推理、困难的数学问题以及最棘手的代码或代理任务。允许进行扩展思考和函数调用。

如需替换默认值，请在配置中设置 thinking_level：

Python

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Prove that the square root of 2 is irrational.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)

print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-3.5-flash",
    contents: "Prove that the square root of 2 is irrational.",
    config: {
      thinkingConfig: {
        thinkingLevel: "HIGH",
      },
    },
  });
  console.log(response.text);
}

main();

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{
      "parts": [{"text": "Prove that the square root of 2 is irrational."}]
    }],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingLevel": "HIGH"
      }
    }
  }'

思维保留

模型会自动在多轮对话中保持中间推理。如果对话历史记录中包含推理上下文，则该上下文会延续下去，从而提高在复杂的多步任务（例如迭代调试和代码重构）方面的性能。无需进行 API 更改：

Interactions API：系统会自动保留想法。行为没有变化。
GenerateContent API：从 Gemini 3.5 Flash 开始，如果对话记录中存在思维签名，模型会使用之前所有轮次的推理上下文。如需启用此功能，请在 contents 中传递完整、未修改的对话历史记录（包括思路签名）。SDK 会自动处理此问题。

Gemini 3.x 中的参数更新和最佳实践

以下内容适用于所有 Gemini 3.x 模型，包括 Gemini 3.5 Flash。

temperature、top_p、top_k：我们强烈建议不要更改默认值。Gemini 3 的推理能力已针对默认设置进行优化。
不过，应使用 thinking_level 代替 thinking_budget。
函数调用响应匹配：id、name 和响应计数必须与前面的调用相匹配。
多模态函数响应：在函数响应内（而非外部）包含多模态内容。
函数响应中的内嵌指令：附加到函数响应文本，而不是作为单独的部分。
减少不必要的工具调用：使用较低的思考水平或通过实验调整系统指令，以减少智能体工作流程中的工具调用。

如需了解如何更新代码，请参阅以下部分。

抽样参数（不再推荐）

temperature、top_p 和 top_k 不再建议用于所有 Gemini 3.x 模型。Gemini 3 的推理能力已针对默认设置进行了优化。从所有请求中移除这些参数。

# ⚠️ Remove these parameters (not recommended)
config = types.GenerateContentConfig(
    temperature=0.7,
    top_p=0.9,
    top_k=40
)

为确保确定性，我们建议您定义一个系统指令，其中包含针对特定使用情形的明确规则。

`thinking_budget`（不再推荐）

现在不建议在所有 Gemini 3.x 模型中使用原始数值 thinking_budget 参数。请改用 thinking_level 字符串枚举。

# ⚠️ Before (not recommended)
config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_budget=7500)
)

# ✅ After
config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_level="medium")
)

可用值：minimal、low、medium（默认值）和 high。

函数调用：严格的响应匹配

对于不匹配的函数响应，Interactions API 已经会返回错误。GenerateContent API 目前不会出错，但如果响应不匹配，模型在大多数情况下会返回带有 finish_reason: STOP 的空响应。请务必遵循以下惯例：

要求	详细信息
包括 `id`	每个 `FunctionResponse` 都必须包含相应 `FunctionCall` 中的 `id`
第 `name` 场	响应中的 `name` 必须与调用中的 `name` 匹配
匹配数	针对收到的每个 `FunctionCall` 准确返回一个 `FunctionResponse`

Python

# ✅ Include matching id and name in the function response
final_response = client.models.generate_content(
    model="gemini-3.5-flash",
    config=config,
    contents=[
        *previous_contents,
        response.candidates[0].content,
        types.Content(role="user", parts=[
            types.Part.from_function_response(
                name=tool_call.name,
                response={"result": result},
                id=tool_call.id,
            )
        ]),
    ],
)

JavaScript

// ✅ Include matching id and name in the function response
const functionResponsePart = {
  functionResponse: {
    name: toolCall.name,
    response: { result: result },
    id: toolCall.id,
  },
};

const finalResponse = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: [
    ...previousContents,
    { role: "model", parts: [{ functionCall: toolCall }] },
    { role: "user", parts: [functionResponsePart] },
  ],
  config: config,
});

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "..."}]},
      {"role": "model", "parts": [{"functionCall": {"name": "my_function", "args": {...}}}]},
      {"role": "user", "parts": [{"functionResponse": {"name": "my_function", "id": "call_id", "response": {"result": "..."}}}]}
    ]
  }'

多模态函数响应

我们经常看到客户在函数响应之外提供图片。这可能会导致模型出现意外行为（例如思维泄露），并导致输出质量下降。请改为遵循多模态函数响应 API 文档中的建议，并在发送给模型的函数响应部分中添加多模态内容。模型可以在下一轮对话中处理此多模态内容，从而生成更明智的回答。

Python

# ✅ Include multimodal content in the function response
final_response = client.models.generate_content(
    model="gemini-3.5-flash",
    config=config,
    contents=[
        *previous_contents,
        response.candidates[0].content,
        types.Content(role="user", parts=[
            types.Part.from_function_response(
                name=tool_call.name,
                response={
                    "result": "instrument.jpg",
                    "image": base64_image_data,
                },
                id=tool_call.id,
            )
        ]),
    ],
)

JavaScript

// ✅ Include multimodal content in the function response
const finalResponse = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: [
    ...previousContents,
    { role: "model", parts: [{ functionCall: toolCall }] },
    {
      role: "user",
      parts: [{
        functionResponse: {
          name: toolCall.name,
          id: toolCall.id,
          response: {
            result: "instrument.jpg",
            image: base64ImageData,
          },
        },
      }],
    },
  ],
  config: config,
});

函数响应中的内嵌指令

我们经常看到客户端在函数响应中提供额外的指令作为后续的 Parts。这可能会导致模型出现意外行为（例如思维泄露），并导致输出质量下降。而是将所有额外的指令附加到函数响应文本的末尾，并用两个换行符分隔。

Python

# ✅ Append inline instructions to the end of the function response separated by two newlines
result_text = f"{json.dumps(result)}\n\n<your inline instructions>"

final_response = client.models.generate_content(
    model="gemini-3.5-flash",
    config=config,
    contents=[
        *previous_contents,
        response.candidates[0].content,
        types.Content(role="user", parts=[
            types.Part.from_function_response(
                name=tool_call.name,
                response={"result": result_text},
                id=tool_call.id,
            )
        ]),
    ],
)

JavaScript

// ✅ Append inline instructions to the end of the function response separated by two newlines
const resultText = `${JSON.stringify(result)}\n\n<your inline instructions>`;

const finalResponse = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: [
    ...previousContents,
    { role: "model", parts: [{ functionCall: toolCall }] },
    {
      role: "user",
      parts: [{
        functionResponse: {
          name: toolCall.name,
          id: toolCall.id,
          response: { result: resultText },
        },
      }],
    },
  ],
  config: config,
});

减少不必要的工具调用

如果您发现工具调用过于频繁，可以采用以下两种技巧来尽量减少这种情况：

首先，降低思考水平（medium、low 或 minimal）：较高的思考水平会促使模型使用更多工具进行探索和验证，因此降低思考水平可以减少工具调用次数。
添加系统指令：如果调整思维水平后过度使用的情况仍然存在，请考虑使用限制工具使用的提示。例如：
```
You have a limited action budget of <n> tool calls. Use them efficiently.
```

迁移核对清单

从 Gemini 3 Flash 预览版迁移

更新了模型名称：gemini-3-flash-preview → gemini-3.5-flash
查看价格。Gemini 3.5 Flash 比 Gemini 3 Flash 预览版更贵。如需了解详情，请参阅价格页面。
从配置中移除 temperature、top_p、top_k（不再推荐）。
将 thinking_budget 替换为 thinking_level。
向所有 FunctionResponse 部分添加了 id 和匹配的 name。
测试提示。默认效果已从 high 更改为 medium；请验证质量、速度和费用。
现在，系统会默认启用“保留想法”。推理上下文会在对话轮次之间延续，这有助于提高性能，但可能会增加令牌用量。
减少不必要的工具调用：首先降低思维水平（medium、low 或 minimal）；如果过度使用工具的情况持续存在，请添加系统指令来限制工具使用。
Gemini 3.5 Flash 目前不支持电脑使用。对于“电脑使用”工作负载，请继续使用 Gemini 3 Flash 预览版。

从 Gemini 2.5 迁移

以上全部，外加：

简化提示。如果您之前使用思维链提示工程来强制推理，请尝试改用 thinking_level: "medium" 或 "high"，并使用更简单的提示。
测试 PDF 和媒体工作负载。如果您之前依赖特定行为进行密集文档解析，请测试 media_resolution_high 设置，以确保准确率不受影响。迁移到 Gemini 3 默认设置还可能会增加 PDF 的 token 使用量，但会减少视频的 token 使用量；如果请求超出上下文窗口，请明确降低 media_resolution。如需了解详情，请参阅媒体分辨率文档。
利用组合工具使用。 Google 搜索、网址上下文、代码执行和自定义函数可以在同一请求中使用。
如果使用多模态函数响应，请将多模态内容移到函数响应部分内，而不是放在旁边。
如果将内嵌指令与函数响应搭配使用，请将内嵌指令附加到函数响应文本中，并用两个换行符分隔，而不是作为单独的部分。
Gemini 3.x 不支持图像分割。对于分割工作负载，请继续使用 Gemini 2.5 Flash（关闭思考功能）或 Gemini Robotics-ER 1.6。

Gemini 3 系列功能

Gemini 3.5 Flash 继承了 Gemini 3 系列的所有功能，但“电脑使用”功能除外。Gemini 3 中引入并沿用至今的功能：

思考：在 API 调用之间保留加密的推理上下文。在 Interactions API 中为自动；在 GenerateContent 中为隐式。
使用工具生成结构化输出：将 JSON 模式与内置工具（搜索、网址上下文、代码执行、函数调用）相结合。
多模态函数响应：在函数调用结果中返回图片、音频和其他媒体。
使用图片执行代码：执行可处理和生成图片的代码。
组合使用工具：在同一请求中使用内置工具和自定义函数调用。

后续步骤

如需详细了解 Gemini 3 系列，请参阅 Gemini 3 开发者指南
如需详细了解提示设计策略，请参阅提示工程指南。
开始使用 Gemini 3 Cookbook
了解 Gemini API 优化和推理

Gemini 3.5 Flash 的新功能

新建模型

快速入门

Python

JavaScript

REST

最新资讯

行为变更

新的默认锻炼强度：medium

Python

JavaScript

REST

思维保留

Gemini 3.x 中的参数更新和最佳实践

抽样参数（不再推荐）

thinking_budget（不再推荐）

函数调用：严格的响应匹配

Python

JavaScript

REST

多模态函数响应

Python

JavaScript

函数响应中的内嵌指令

Python

JavaScript

减少不必要的工具调用

迁移核对清单

从 Gemini 3 Flash 预览版迁移

从 Gemini 2.5 迁移

Gemini 3 系列功能

后续步骤

新的默认锻炼强度：`medium`

`thinking_budget`（不再推荐）