Gemini 3 Flash 现已问世。在 Google AI Studio 中免费试用。

此页面由 Cloud Translation API 翻译。

Gemini 思考

Gemini 3 和 2.5 系列模型采用内部“思考过程”，可显著提升推理和多步规划能力，因此非常适合处理编码、高等数学和数据分析等复杂任务。

本指南介绍了如何使用 Gemini API 来利用 Gemini 的思考能力。

生成内容时进行思考

使用思考模型发起请求与发起任何其他内容生成请求类似。主要区别在于在 model 字段中指定支持思考功能的模型，如下面的文本生成示例所示：

Python

from google import genai

client = genai.Client()
prompt = "Explain the concept of Occam's Razor and provide a simple, everyday example."
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=prompt
)

print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const prompt = "Explain the concept of Occam's Razor and provide a simple, everyday example.";

  const response = await ai.models.generateContent({
    model: "gemini-2.5-pro",
    contents: prompt,
  });

  console.log(response.text);
}

main();

Go

package main

import (
  "context"
  "fmt"
  "log"
  "os"
  "google.golang.org/genai"
)

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  prompt := "Explain the concept of Occam's Razor and provide a simple, everyday example."
  model := "gemini-2.5-pro"

  resp, _ := client.Models.GenerateContent(ctx, model, genai.Text(prompt), nil)

  fmt.Println(resp.Text())
}

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent" \
 -H "x-goog-api-key: $GEMINI_API_KEY" \
 -H 'Content-Type: application/json' \
 -X POST \
 -d '{
   "contents": [
     {
       "parts": [
         {
           "text": "Explain the concept of Occam\'s Razor and provide a simple, everyday example."
         }
       ]
     }
   ]
 }'
 ```

思考总结

思考总结是模型原始思考的合成版本，可帮助您深入了解模型的内部推理过程。请注意，思考水平和预算适用于模型的原始想法，而不适用于想法总结。

您可以在请求配置中将 includeThoughts 设置为 true，以启用思路总结。然后，您可以通过迭代 response 参数的 parts 并检查 thought 布尔值来访问摘要。

以下示例展示了如何在不进行流式传输的情况下启用和检索思路总结，该方法会通过响应返回单个最终思路总结：

Python

from google import genai
from google.genai import types

client = genai.Client()
prompt = "What is the sum of the first 50 prime numbers?"
response = client.models.generate_content(
  model="gemini-2.5-pro",
  contents=prompt,
  config=types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(
      include_thoughts=True
    )
  )
)

for part in response.candidates[0].content.parts:
  if not part.text:
    continue
  if part.thought:
    print("Thought summary:")
    print(part.text)
    print()
  else:
    print("Answer:")
    print(part.text)
    print()

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-2.5-pro",
    contents: "What is the sum of the first 50 prime numbers?",
    config: {
      thinkingConfig: {
        includeThoughts: true,
      },
    },
  });

  for (const part of response.candidates[0].content.parts) {
    if (!part.text) {
      continue;
    }
    else if (part.thought) {
      console.log("Thoughts summary:");
      console.log(part.text);
    }
    else {
      console.log("Answer:");
      console.log(part.text);
    }
  }
}

main();

Go

package main

import (
  "context"
  "fmt"
  "google.golang.org/genai"
  "os"
)

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  contents := genai.Text("What is the sum of the first 50 prime numbers?")
  model := "gemini-2.5-pro"
  resp, _ := client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{
    ThinkingConfig: &genai.ThinkingConfig{
      IncludeThoughts: true,
    },
  })

  for _, part := range resp.Candidates[0].Content.Parts {
    if part.Text != "" {
      if part.Thought {
        fmt.Println("Thoughts Summary:")
        fmt.Println(part.Text)
      } else {
        fmt.Println("Answer:")
        fmt.Println(part.Text)
      }
    }
  }
}

以下示例展示了如何使用流式思考，该功能可在生成期间返回滚动式增量摘要：

Python

from google import genai
from google.genai import types

client = genai.Client()

prompt = """
Alice, Bob, and Carol each live in a different house on the same street: red, green, and blue.
The person who lives in the red house owns a cat.
Bob does not live in the green house.
Carol owns a dog.
The green house is to the left of the red house.
Alice does not own a cat.
Who lives in each house, and what pet do they own?
"""

thoughts = ""
answer = ""

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-pro",
    contents=prompt,
    config=types.GenerateContentConfig(
      thinking_config=types.ThinkingConfig(
        include_thoughts=True
      )
    )
):
  for part in chunk.candidates[0].content.parts:
    if not part.text:
      continue
    elif part.thought:
      if not thoughts:
        print("Thoughts summary:")
      print(part.text)
      thoughts += part.text
    else:
      if not answer:
        print("Answer:")
      print(part.text)
      answer += part.text

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

const prompt = `Alice, Bob, and Carol each live in a different house on the same
street: red, green, and blue. The person who lives in the red house owns a cat.
Bob does not live in the green house. Carol owns a dog. The green house is to
the left of the red house. Alice does not own a cat. Who lives in each house,
and what pet do they own?`;

let thoughts = "";
let answer = "";

async function main() {
  const response = await ai.models.generateContentStream({
    model: "gemini-2.5-pro",
    contents: prompt,
    config: {
      thinkingConfig: {
        includeThoughts: true,
      },
    },
  });

  for await (const chunk of response) {
    for (const part of chunk.candidates[0].content.parts) {
      if (!part.text) {
        continue;
      } else if (part.thought) {
        if (!thoughts) {
          console.log("Thoughts summary:");
        }
        console.log(part.text);
        thoughts = thoughts + part.text;
      } else {
        if (!answer) {
          console.log("Answer:");
        }
        console.log(part.text);
        answer = answer + part.text;
      }
    }
  }
}

await main();

Go

package main

import (
  "context"
  "fmt"
  "log"
  "os"
  "google.golang.org/genai"
)

const prompt = `
Alice, Bob, and Carol each live in a different house on the same street: red, green, and blue.
The person who lives in the red house owns a cat.
Bob does not live in the green house.
Carol owns a dog.
The green house is to the left of the red house.
Alice does not own a cat.
Who lives in each house, and what pet do they own?
`

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  contents := genai.Text(prompt)
  model := "gemini-2.5-pro"

  resp := client.Models.GenerateContentStream(ctx, model, contents, &genai.GenerateContentConfig{
    ThinkingConfig: &genai.ThinkingConfig{
      IncludeThoughts: true,
    },
  })

  for chunk := range resp {
    for _, part := range chunk.Candidates[0].Content.Parts {
      if len(part.Text) == 0 {
        continue
      }

      if part.Thought {
        fmt.Printf("Thought: %s\n", part.Text)
      } else {
        fmt.Printf("Answer: %s\n", part.Text)
      }
    }
  }
}

控制思维

Gemini 模型默认采用动态思考，会根据用户请求的复杂程度自动调整推理力度。不过，如果您有特定的延迟时间限制条件，或者需要模型进行比平时更深入的推理，可以选择使用参数来控制思考行为。

思考等级 (Gemini 3)

建议为 Gemini 3 及更高版本的模型使用 thinkingLevel 参数，该参数可用于控制推理行为。您可以将 Gemini 3 Pro 的思考水平设置为 "low" 或 "high"，并将 Gemini 3 Flash 的思考水平设置为 "minimal"、"low"、"medium" 和 "high"。

Gemini 3 Pro 和 Flash 的思维水平：

low：最大限度地缩短延迟时间并降低费用。最适合简单指令遵循、聊天或高吞吐量应用
high（默认，动态）：最大限度地提高推理深度。模型可能需要更长时间才能生成第一个 token，但输出结果会经过更仔细的推理。

Gemini 3 Flash thinking 水平

除了上述级别之外，Gemini 3 Flash 还支持以下 Gemini 3 Pro 目前不支持的思维级别：

medium：平衡思考，适用于大多数任务。
minimal：与大多数查询的“不思考”设置相匹配。对于复杂的编码任务，模型可能只会进行非常简单的思考。最大限度地缩短聊天或高吞吐量应用的延迟时间。

注意： minimal 无法保证思考已关闭。

Python

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Provide a list of 3 famous physicists and their key contributions",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="low")
    ),
)

print(response.text)

JavaScript

import { GoogleGenAI, ThinkingLevel } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-3-flash-preview",
    contents: "Provide a list of 3 famous physicists and their key contributions",
    config: {
      thinkingConfig: {
        thinkingLevel: ThinkingLevel.LOW,
      },
    },
  });

  console.log(response.text);
}

main();

Go

package main

import (
  "context"
  "fmt"
  "google.golang.org/genai"
  "os"
)

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  thinkingLevelVal := "low"

  contents := genai.Text("Provide a list of 3 famous physicists and their key contributions")
  model := "gemini-3-flash-preview"
  resp, _ := client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{
    ThinkingConfig: &genai.ThinkingConfig{
      ThinkingLevel: &thinkingLevelVal,
    },
  })

fmt.Println(resp.Text())
}

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Provide a list of 3 famous physicists and their key contributions"
        }
      ]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
          "thinkingLevel": "low"
    }
  }
}'

对于 Gemini 3 Pro，您无法停用思考功能。Gemini 3 Flash 也不支持完全关闭思考，但 minimal 设置意味着模型可能不会思考（尽管它仍然有可能思考）。如果您未指定思考水平，Gemini 将使用 Gemini 3 模型的默认动态思考水平 "high"。

Gemini 2.5 系列模型不支持 thinkingLevel；请改用 thinkingBudget。

思考预算

thinkingBudget 参数是随 Gemini 2.5 系列推出的，用于为模型提供指导，帮助其了解用于推理的思考 token 的具体数量。

以下是每种模型类型的 thinkingBudget 配置详细信息。您可以通过将 thinkingBudget 设置为 0 来停用思考功能。将 thinkingBudget 设置为 -1 会开启动态思考，这意味着模型会根据请求的复杂程度调整预算。

型号	默认设置（未设置思考预算）	Range	停用思考	开启动态思维
2.5 Pro	动态思考：模型决定何时以及思考多少	`128`到`32768`	不适用：无法停用思考功能	`thinkingBudget = -1`
2.5 Flash	动态思考：模型决定何时以及思考多少	`0`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`
2.5 Flash 预览版	动态思考：模型决定何时以及思考多少	`0`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`
2.5 Flash Lite	模型不会思考	`512`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`
2.5 Flash Lite 预览版	模型不会思考	`512`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`
Robotics-ER 1.5 预览版	动态思考：模型决定何时以及思考多少	`0`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`
2.5 Flash Live 原生音频预览版 (09-2025)	动态思考：模型决定何时以及思考多少	`0`到`24576`	`thinkingBudget = 0`	`thinkingBudget = -1`

Python

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Provide a list of 3 famous physicists and their key contributions",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=1024)
        # Turn off thinking:
        # thinking_config=types.ThinkingConfig(thinking_budget=0)
        # Turn on dynamic thinking:
        # thinking_config=types.ThinkingConfig(thinking_budget=-1)
    ),
)

print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Provide a list of 3 famous physicists and their key contributions",
    config: {
      thinkingConfig: {
        thinkingBudget: 1024,
        // Turn off thinking:
        // thinkingBudget: 0
        // Turn on dynamic thinking:
        // thinkingBudget: -1
      },
    },
  });

  console.log(response.text);
}

main();

Go

package main

import (
  "context"
  "fmt"
  "google.golang.org/genai"
  "os"
)

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, nil)
  if err != nil {
      log.Fatal(err)
  }

  thinkingBudgetVal := int32(1024)

  contents := genai.Text("Provide a list of 3 famous physicists and their key contributions")
  model := "gemini-2.5-flash"
  resp, _ := client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{
    ThinkingConfig: &genai.ThinkingConfig{
      ThinkingBudget: &thinkingBudgetVal,
      // Turn off thinking:
      // ThinkingBudget: int32(0),
      // Turn on dynamic thinking:
      // ThinkingBudget: int32(-1),
    },
  })

fmt.Println(resp.Text())
}

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Provide a list of 3 famous physicists and their key contributions"
        }
      ]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
          "thinkingBudget": 1024
    }
  }
}'

根据提示的不同，模型可能会超出或低于令牌预算。

思考签名

Gemini API 是无状态的，因此模型会独立处理每个 API 请求，并且无法访问多轮交互中之前轮次的思考上下文。

为了能够在多轮互动中保持思考上下文，Gemini 会返回思考签名，这是模型内部思考过程的加密表示形式。

Gemini 2.5 系列模型会在启用思考功能且请求包含函数调用（具体来说是函数声明）时返回思考签名。
Gemini 3 模型可能会针对所有类型的部分返回思路签名。我们建议您始终按原样传递所有签名，但对于函数调用签名，这是必需的。如需了解详情，请参阅思路签名页面。

注意：即使将 Gemini Flash 3 设置为 minimal，也需要流通思想签名。

Google GenAI SDK 会自动处理思考签名的返回。只有在修改对话历史记录或使用 REST API 时，才需要手动管理想法签名。

使用函数调用时，还需考虑以下用量限制：

特征由模型在回答的其他部分（例如函数调用或文本部分）中返回。在后续对话轮次中，将包含所有部分的完整回答返回给模型。
请勿将带有签名的部分串联在一起。
请勿将包含签名的部分与不包含签名的部分合并。

价格

开启思考功能后，回答价格是输出 token 和思考 token 的总和。您可以从 thoughtsTokenCount 字段获取生成的思考令牌总数。

Python

# ...
print("Thoughts tokens:",response.usage_metadata.thoughts_token_count)
print("Output tokens:",response.usage_metadata.candidates_token_count)

JavaScript

// ...
console.log(`Thoughts tokens: ${response.usageMetadata.thoughtsTokenCount}`);
console.log(`Output tokens: ${response.usageMetadata.candidatesTokenCount}`);

Go

// ...
usageMetadata, err := json.MarshalIndent(response.UsageMetadata, "", "  ")
if err != nil {
  log.Fatal(err)
}
fmt.Println("Thoughts tokens:", string(usageMetadata.thoughts_token_count))
fmt.Println("Output tokens:", string(usageMetadata.candidates_token_count))

思考模型会生成完整的想法，以提高最终回答的质量，然后输出摘要，以便深入了解思考过程。因此，定价是根据模型生成摘要所需的完整思维令牌数来确定的，尽管 API 只输出摘要。

如需详细了解令牌，请参阅令牌计数指南。

最佳做法

本部分包含一些有关如何高效使用思考模型的指导。与往常一样，遵循我们的提示指南和最佳实践将有助于您获得最佳结果。

调试和引导

检查推理过程：当推理模型未给出您预期的回答时，仔细分析 Gemini 的推理总结会有所帮助。您可以了解模型如何分解任务并得出结论，并使用这些信息来修正结果，使其更符合预期。
在推理中提供指导：如果您希望输出内容特别长，不妨在提示中提供指导，限定模型使用的思考量。这样，您就可以为回答预留更多 token 输出。

任务复杂性

简单任务（可关闭思考）：对于不需要复杂推理的简单请求（例如事实检索或分类），无需思考。例如：
- “DeepMind 是在哪里创立的？”
- “这封电子邮件是要求安排会议，还是仅提供信息？”
中等任务（默认/需要一定程度的思考）：许多常见请求都需要一定程度的分步处理或更深入的理解。Gemini 可以灵活运用思考能力来完成以下任务：
- 将光合作用和成长进行类比。
- 比较并对比电动汽车和混合动力汽车。
困难任务（最大思考能力）：对于真正复杂的挑战，例如解决复杂的数学问题或编码任务，我们建议设置较高的思考预算。这类任务要求模型充分发挥推理和规划能力，通常需要经过许多内部步骤才能提供答案。例如：
- 解决 2025 年 AIME 中的问题 1：求出所有整数基数 b > 9 的总和，使得 17_b 是 97_b 的除数。
- 为可直观呈现实时股票市场数据的 Web 应用编写 Python 代码，包括用户身份验证。尽可能提高效率。

支持的模型、工具和功能

所有 3 系列和 2.5 系列模型均支持思考功能。您可以在模型概览页面上找到所有模型功能。

思考模型可与 Gemini 的所有工具和功能搭配使用。这使模型能够与外部系统互动、执行代码或访问实时信息，并将结果纳入其推理和最终回答中。

您可以在思考食谱中尝试将工具与思考模型搭配使用的示例。

后续操作

如需了解覆盖范围，请参阅我们的 OpenAI 兼容性指南。