Gemini 思考

Gemini 2.5 系列模型在生成回答期间会使用内部“思考过程”。此过程有助于提高孩子的推理能力,并帮助他们使用多步骤规划来解决复杂的任务。因此,这些模型特别擅长编码、高级数学、数据分析以及其他需要规划或思考的任务。

本指南介绍了如何使用 Gemini API 处理 Gemini 的思考功能。

使用思维模型

具有思考能力的模型可在 Google AI Studio 中使用,也可通过 Gemini API 使用。默认情况下,API 和 AI Studio 中的思考功能处于开启状态,因为 2.5 系列模型能够根据问题自动决定何时思考以及思考多少。对于大多数用例,让系统保持思考状态是有益的。不过,如果您想关闭思考功能,可以将 thinkingBudget 参数设置为 0。

发送基本请求

from google import genai

client = genai.Client(api_key="GOOGLE_API_KEY")
prompt = "Explain the concept of Occam's Razor and provide a simple, everyday example."
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=prompt
)

print(response.text)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });

async function main() {
  const prompt = "Explain the concept of Occam's Razor and provide a simple, everyday example.";

  const response = await ai.models.generateContent({
    model: "gemini-2.5-flash-preview-04-17",
    contents: prompt,
  });

  console.log(response.text);
}

main();
// import packages here

func main() {
  ctx := context.Background()
  client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("GOOGLE_API_KEY")))
  if err != nil {
    log.Fatal(err)
  }
  defer client.Close()

  model := client.GenerativeModel("gemini-2.5-flash-preview-04-17")
  resp, err := model.GenerateContent(ctx, genai.Text("Explain the concept of Occam's Razor and provide a simple, everyday example."))
  if err != nil {
    log.Fatal(err)
  }
  fmt.Println(resp.Text())
}
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-04-17:generateContent?key=$GOOGLE_API_KEY" \
 -H 'Content-Type: application/json' \
 -X POST \
 -d '{
   "contents": [
     {
       "parts": [
         {
           "text": "Explain the concept of Occam\''s Razor and provide a simple, everyday example."
         }
       ]
     }
   ]
 }'
 ```

为思考模型设置预算

thinkingBudget 参数可为模型提供有关其在生成回答时可以使用的思考令牌数量的指导。令牌数量越多,通常与更详细的思考相关,而这对于解决更复杂的任务至关重要。thinkingBudget 必须是介于 0 到 24576 之间的整数。将思考预算设置为 0 会停用思考。

根据问题,模型可能会超出或低于令牌预算。

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents="Explain the Occam's Razor concept and provide everyday examples of it",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=1024)
    ),
)

print(response.text)
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });

async function main() {
  const response = await ai.models.generateContent({
    model: "gemini-2.5-flash-preview-04-17",
    contents: "Explain the Occam's Razor concept and provide everyday examples of it",
    config: {
      thinkingConfig: {
        thinkingBudget: 1024,
      },
    },
  });

  console.log(response.text);
}

main();
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-04-17:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "Explain the Occam\''s Razor concept and provide everyday examples of it"
        }
      ]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
          "thinkingBudget": 1024
    }
  }
}'

将工具与思维模型搭配使用

思考模型除了生成文本之外,还可以使用工具执行其他操作。这样,它们就可以与外部系统交互、执行代码或访问实时信息,并将结果纳入推理和最终回答中。

搜索工具

借助搜索工具,模型可以查询外部搜索引擎,以查找最新信息或超出其训练数据的信息。这对于有关近期事件或极其具体的主题的问题非常有用。

如需配置搜索工具,请参阅配置搜索工具

What were the major scientific breakthroughs announced last week?
  
Based on recent search results, here are some highlights from the past week in science:

...
  

代码执行

借助代码执行工具,模型可以生成和运行 Python 代码,以执行计算、处理数据或解决最好通过算法处理的问题。模型会接收代码的输出,并可以在其回答中使用该输出。

如需配置代码执行,请参阅在模型上启用代码执行

What is the sum of the first 50 prime numbers?
Generate and run Python code for the calculation, and make sure you get all 50.
Provide the final sum clearly.
  
Okay, I can calculate that. I will generate Python code to find the first 50 prime numbers and sum them up.

```python
def is_prime(num):
    if num < 2:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True

primes = []
num = 2
while len(primes) < 50:
    if is_prime(num):
        primes.append(num)
    num += 1

total_sum = sum(primes)
print(total_sum)
```

Running the code...

Code Execution Result:
Outcome: OK
Output:

5117
Use code with caution.
The sum of the first 50 prime numbers is 5117.

利用结构化输出进行思考

借助结构化输出,您可以限制 Gemini 以 JSON 格式(一种适合自动处理的结构化输出格式)进行响应。这对于将模型输出集成到应用中特别有用。

您可以直接在提示中指定格式,也可以通过模型配置提供架构

Provide a list of 3 famous physicists and their key contributions
in JSON format.
Use this JSON schema:

Physicist = {'name': str, 'contribution': str, 'era': str}
Return: list[Physicist]
```json
[
{
"name": "Albert Einstein",
"contribution": "Theory of Relativity (Special and General)",
"era": "Early 20th Century"
},
{
"name": "Isaac Newton",
"contribution": "Laws of Motion and Universal Gravitation",
"era": "Late 17th / Early 18th Century"
},
{
"name": "Marie Curie",
"contribution": "Pioneering research on radioactivity",
"era": "Late 19th / Early 20th Century"
}
]
```

针对思维模型的提示技巧

有效的提示对于充分发挥 Gemini 思考型模型的优势至关重要。本部分简要介绍了一些针对思考模型的提示做法。

使用入门

首先,提供一些说明来描述预期结果。从一个简单的提示开始,观察模型在如何应对该挑战方面的初步想法。

Describe the most unusual or striking architectural feature you've ever seen in a building or structure.
  
<...>
The most unusual architectural feature that has truly stayed with me is the "floating roof" of the
  Heydar Aliyev Center in Baku, Azerbaijan, designed by Zaha Hadid Architects.

Visually, it's breathtaking. Imagine a vast, undulating landscape of pure white, seemingly draped
over the building. It's not just a roof in the traditional sense; it's an extension of the ground,
flowing seamlessly upwards and over the building's various spaces before curving back down to meet
the earth again. There are no sharp angles, no rigid lines. Instead, everything is fluid, organic,
and continuous. It resembles a giant, pristine white wave frozen in motion, or perhaps a sand dune
sculpted by an unseen wind.
<...>
  

提供分步说明

如果初次尝试未能取得预期结果,请提供更明确的分步说明来优化问题,引导 Gemini 完成问题解决流程,从而采用更专注的方法。这样,您就可以通过提供明确的建议和指导来引导模型对任务的初步理解。

Now imagine that architectural feature is not just for looks, but has a specific function or
purpose related to the building's environment or the people who use it. Explain what that
function could be and how it enhances the building's overall design and usability.
  
<...>
Let's imagine that the "floating roof" of the Heydar Aliyev Center is not purely aesthetic but
serves a crucial environmental and user-centric function: it's designed as a sophisticated climate
moderator and natural ventilation system for the Baku environment.

Here's how this function would work and enhance the building:

Function: Passive Climate Control and Natural Ventilation System

Shading and Solar Heat Mitigation: The undulating curves of the white GFRC roof are strategically
shaped and oriented to provide optimal shading throughout the day, year-round. Baku experiences
intense sun, especially during the summer. The overhanging curves, acting like giant brise-soleils,
would block direct sunlight from hitting the building's glass facade, significantly reducing solar
heat gain. The varying heights and depths of the curves would create dynamic shade patterns, ensuring
that different parts of the building are shaded at different times of the day, optimizing comfort
and reducing the need for excessive air conditioning. The white color of the GFRC further enhances
this by reflecting a large portion of the solar radiation.
<...>
  

多张照片提示(含思考)

结合使用多张照片提示和思考功能,可以进一步增强 Gemini 的理解能力,并提高其回答的准确性。提供多个输入-输出对示例,在提示中演示所需的行为和输出格式。

Example 1:
User: What is the tallest mountain in the world?
Assistant: Mount Everest

Example 2:
User: What is the largest ocean?
Assistant: Pacific Ocean

User: What is the longest river in the world?
Assistant:
  
Amazon River
  

定义输出和行为

在构建用户将直接与模型互动的应用时,您可能需要就 Gemini 的输出和回答的语音和格式提供指导。

系统指令

系统指令是指模型在处理提示之前所处理的一组指令。系统会在每次向模型提示时调用这些指令,并告知模型您希望其如何表现和响应。例如,您可以添加角色、背景信息和格式设置说明等内容。

System Instruction: You are a helpful and concise coding tutor. Always provide code examples and explain them briefly.

User Prompt: Explain what is a closure in Javascript and give an example.
  
    function outerFunction(outerVar) {
      function innerFunction(innerVar) {
        console.log(outerVar); // Accessing outerVar from outerFunction's scope
        console.log(innerVar);
      }
      return innerFunction;
    }

    const closureExample = outerFunction('I am from outer');
    closureExample('I am from inner');

Explanation:

A closure is created when a function (innerFunction in this example) is defined inside another function
(outerFunction) and the inner function remembers and has access to the variables from the outer
function's scope, even after the outer function has finished executing.

In the example:

outerFunction is called with 'I am from outer' and returns innerFunction.

closureExample now holds a reference to innerFunction.

When closureExample('I am from inner') is called, innerFunction executes.

Even though outerFunction has already finished, innerFunction can still access and use outerVar
(which is 'I am from outer') because of the closure.

In essence, the innerFunction "closes over" the variables of its outer function's scope, creating
a closure.
  

验证和反思

直接在“系统说明”或提示中添加具体验证步骤。例如,您可以让 Gemini 验证其来源、检查其推理过程、找出可能的错误,以及检查最终答案。

    Verify the following statement and provide your sources: The capital of Brazil is Rio de Janeiro.
  
The statement "The capital of Brazil is Rio de Janeiro" is incorrect.

The current capital of Brazil is Brasília.

Rio de Janeiro was the capital of Brazil for a significant period, from 1763 to 1960. However, the
capital was moved to Brasília in 1960 as part of a plan to develop the country's interior and create
a more centrally located capital.

Verification and Sources:

You can easily verify this information from numerous reliable sources:

Encyclopedias:

Britannica: Searching for "Brazil" on Britannica will clearly state Brasília as the capital.

https://www.britannica.com/place/Brazil (Look for the "Capital" information in the introductory summary or "Geography" section)

Wikipedia: The Wikipedia article for "Brazil" also states Brasília as the capital.

https://en.wikipedia.org/wiki/Brazil (Look in the infobox or introductory section)

Government and Official Sources:

CIA World Factbook: This is a highly reliable source for country information. Searching for "Brazil"
will list Brasília as the capital.

https://www.cia.gov/the-world-factbook/countries/brazil/ (Look under "Government" > "Capital")

Official Website of the Brazilian Government (if accessible and in English): While navigating official
government websites can sometimes be complex, they would definitively list Brasília as the capital.

Geographic Resources:

National Geographic: Their website and publications will confirm Brasília as the capital of Brazil.

Atlases and Geography Textbooks: Any modern atlas or geography textbook will list Brasília as the capital.

Why the confusion?

The confusion might arise because:

Rio de Janeiro is a very famous Brazilian city: It's a major tourist destination, known for its
beaches, Carnival, and iconic landmarks like Christ the Redeemer. Its past status as capital adds
to its historical significance and public recognition.

Historical context: Rio de Janeiro was the capital for a long time. People who learned about Brazil
in the past might remember it as the capital.

In conclusion, the statement is false. The capital of Brazil is Brasília, not Rio de Janeiro. Rio
de Janeiro was the former capital but is no longer the current capital.

You can confidently use the sources listed above to verify this information.
  

调试和引导

Google AI Studio 可让开发者在界面中查看模型的想法。这对开发者在调试时审核问题很有帮助:

分析思维过程

仔细查看 Gemini 的推理过程,找出可能出错的地方。此分析可以提供有价值的线索,帮助您优化提示,并在应用中始终如预期获得响应。

示例场景:如果 Gemini 的推理过程表明它误解了复杂说明中的某个步骤,您可以优化提示以阐明该特定步骤。

Step 1: Summarize the following article about quantum physics.
Step 2: Translate the summary into Spanish.
Step 3: Extract the key entities from the Spanish summary.

Article: ... [Article Text] ...
The user wants me to process an article about Google's new quantum chip called Willow in three steps:
Step 1: Summarize the article.
Step 2: Translate the summary into Spanish.
Step 3: Extract the key entities from the Spanish summary.

Let's break down each step.

Step 1: Summarize the article.
I need to read the article and identify the main points. The article is about Google's new quantum
chip, Willow, and its achievements. The key achievements are:

Exponential error reduction as qubits scale up.

Outperforming supercomputers in a benchmark computation (RCS).

Willow is a step towards commercially relevant quantum computing.

It's the first system below threshold for quantum error correction.

It achieved a benchmark computation in under 5 minutes that would take a supercomputer 10 septillion years.

It was fabricated in a new facility.

It has state-of-the-art performance and high quality qubits.

Google is looking towards real-world applications with Willow.

Quantum computing will benefit AI.

I will synthesize these points into a concise summary.

Step 2: Translate the summary into Spanish.
Once I have the summary in English, I will translate it into Spanish. I need to ensure the translation
is accurate and natural-sounding in Spanish.

Step 3: Extract the key entities from the Spanish summary.
After translating the summary, I will identify and extract the key entities. Key entities are typically
nouns or noun phrases that represent important objects, concepts, or actors. In this context, they
will likely include:

Names of chips (Willow, Sycamore)

Companies/Organizations (Google Quantum AI, Google Research)

Technologies (quantum chip, quantum computer, qubits, quantum error correction, superconducting
quantum system, classical computers, supercomputers, AI)

Benchmarks (random circuit sampling, RCS)

People (Julian Kelly, Peter Shor, David Deutsch, Sergio Boixo, Hartmut Neven, John Preskill)

Concepts (exponential error reduction, below threshold, beyond breakeven, state-of-the-art performance,
commercially relevant applications, quantum mechanics, multiverse, T1 times, system benchmarks,
algorithmic benchmarks)

I will need to carefully review the Spanish summary to identify these entities and list them out.

Provide Corrective Feedback

If you observe errors or inconsistencies in Gemini's reasoning, provide corrective feedback in your prompt and/or System Instructions to anticipate where the model's response may deviate from your desired outcome.

You are an expert in botany. When asked to identify a plant, focus on leaf shape, flower structure, and stem characteristics.

Identify this plant based on these characteristics: ... [Plant Characteristics Description] ...

Previous incorrect response focused on habitat only. Please disregard habitat and focus on morphology.
Previous incorrect response focused on habitat only. Please disregard habitat and focus on morphology.
Tabebuia vellosoi

充分利用长篇内容和长篇幅思考

思考功能会在令牌输出中生成“想法”,以展示 Gemini 的详细推理过程。如需充分利用这些输出,请执行以下操作:

查看推理过程

仔细观察 Gemini 的推理过程(可在 Google AI Studio 界面中查看),它如何拆解任务,以及它为输出做出的选择。这有助于您了解系统是如何得出结论的。

提供推理指导

如果您希望获得的输出特别长,不妨在问题中提供更多指导,以限制完成问题中所述任务所需的思考量。这样,您就可以为响应预留更多令牌输出。

示例:如需生成详细报告,您可以在提示中指定部分和子部分,以引导 Gemini 的思考并高效地构建输出内容。

Generate a detailed report on the impact of climate change on coastal ecosystems.

Report Structure:

Introduction

Rising Sea Levels
2.1. Impact on Mangroves
2.2. Impact on Coral Reefs

Ocean Acidification
3.1. Impact on Shellfish
3.2. Impact on Plankton

Extreme Weather Events
4.1. Increased Storm Intensity
4.2. Coastal Erosion

Conclusion

Provide detailed analysis within each section.
<...>
Climate Change Impact on Coastal Ecosystems: A Detailed Report

1. Introduction

Coastal ecosystems, the dynamic interface between land and sea, are among the most productive and
biodiverse environments on Earth. They provide crucial ecosystem services, including coastal protection,
fisheries, carbon sequestration, and support for a vast array of species. However, these vital
ecosystems are increasingly threatened by the multifaceted impacts of climate change. Driven by
anthropogenic greenhouse gas emissions, climate change is manifesting in several interconnected ways
that are profoundly altering the physical and chemical conditions of coastal environments. This report
will delve into the key impacts of climate change on coastal ecosystems, focusing on rising sea levels,
ocean acidification, and extreme weather events, and their specific consequences for critical habitats
like mangroves and coral reefs, and key organisms like shellfish and plankton. Understanding these
impacts is crucial for developing effective conservation and management strategies to mitigate the
damage and enhance the resilience of these invaluable ecosystems.

2. Rising Sea Levels

Rising sea levels are a direct and significant consequence of global warming, driven by thermal
expansion of seawater and the melting of land ice (glaciers and ice sheets). This phenomenon poses a
direct threat to low-lying coastal ecosystems by inundating habitats, altering salinity regimes, and
increasing erosion rates. The rate of sea level rise is accelerating, and projections for the 21st
century indicate a continued upward trend, with potentially devastating consequences for coastal
environments.

2.1. Impact on Mangroves

Mangrove forests are unique coastal ecosystems found in tropical and subtropical intertidal zones.
They provide numerous benefits, including coastal protection against storms, nursery grounds for
fish and invertebrates, and significant carbon sequestration...
<...>

更多可尝试的操作

  • 简单任务(可以关闭思考):对于简单的请求,不需要进行复杂的推理,例如简单的事实检索或分类,不需要思考。例如:
    • “DeepMind 的创始地在哪里?”
    • “这封电子邮件是想安排会议,还是只是提供信息?”
  • 中等任务(默认/需要思考):许多常见请求都需要进行一定程度的分步处理或更深入的理解。Gemini 可以灵活地使用思考功能来执行以下任务:
    • 将光合作用与成长进行类比。
    • 比较和对比电动汽车与混合动力汽车。
  • 高难度任务(最大限度发挥思考能力):对于真正复杂的挑战,AI 需要调用其全部推理和规划能力,通常需要完成许多内部步骤才能提供答案。例如:
    • 解 AIME 2025 题目 1:求所有大于 9 且 17b 是 97b 的因子的整数底数 b 的总和。
    • 为可直观呈现实时股市数据(包括用户身份验证)的 Web 应用编写 Python 代码。尽可能提高效率。

后续操作