教程:Gemini API 使用入门


在 Google AI 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码

本快速入门演示了如何将 Python SDK 用于 Gemini API,以便访问 Google 的 Gemini 大语言模型。在本快速入门中,您将学习如何完成以下操作:

  1. 设置开发环境和 API 访问权限才能使用 Gemini。
  2. 根据文本输入生成文本响应。
  3. 根据多模态输入(文本和图片)生成文本响应。
  4. 使用 Gemini 进行多轮对话(聊天)。
  5. 将嵌入用于大型语言模型。

前提条件

您可以在 Google Colab 中运行本快速入门,它会直接在浏览器中运行此笔记本,而无需额外配置环境。

或者,如需在本地完成本快速入门,请确保您的开发环境满足以下要求:

  • Python 3.9 及更高版本
  • 安装用于运行笔记本的 jupyter

设置

安装 Python SDK

适用于 Gemini API 的 Python SDK 包含在 google-generativeai 软件包中。使用 pip 安装依赖项:

pip install -q -U google-generativeai

导入软件包

导入必要的软件包。

import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
# Used to securely store your API key
from google.colab import userdata

设置您的 API 密钥

您必须先获取 API 密钥,然后才能使用 Gemini API。如果您还没有密钥,只需在 Google AI Studio 中点击一下即可创建一个。

获取 API 密钥

在 Colab 中,将密钥添加到左侧面板“🔑?”下的 Secret 管理器中。将其命名为 GOOGLE_API_KEY

获得 API 密钥后,将其传递给 SDK。可以通过以下两种方法实现此目的:

  • 将密钥放在 GOOGLE_API_KEY 环境变量中(SDK 会自动从中选取密钥)。
  • 将密钥传递给 genai.configure(api_key=...)
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

列出模型

现在,您可以调用 Gemini API 了。使用 list_models 查看可用的 Gemini 模型:

  • gemini-1.5-flash:我们最快的多模态模型
  • gemini-1.5-pro:我们最强大、最智能的多模态模型
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

根据文本输入生成文本

对于纯文字问题,请使用 Gemini 1.5 模型或 Gemini 1.0 Pro 模型:

model = genai.GenerativeModel('gemini-1.5-flash')

generate_content 方法可以处理各种用例,包括多轮聊天和多模态输入,具体取决于底层模型支持的内容。可用的模型仅支持输入文本和图片,输出文本。

在最简单的情况下,您可以将提示字符串传递给 GenerativeModel.generate_content 方法:

%%time
response = model.generate_content("What is the meaning of life?")
CPU times: user 110 ms, sys: 12.3 ms, total: 123 ms
Wall time: 8.25 s

在简单的情况下,您只需要 response.text 访问器即可。如需显示带格式的 Markdown 文本,请使用 to_markdown 函数:

to_markdown(response.text)

对于生命目的的疑问,几个世纪、不同文化和不同大洲的人们都为之感到困惑。虽然没有普遍认可的回应,但人们已经提出了许多想法,而回应通常取决于个人的想法、信念和生活经历。

  1. 幸福与健康:许多人认为,人生的目标是获得个人幸福感和幸福感。这可能包括寻找令人愉悦的追求、建立重要的关系、关爱一个人的身心健康,以及追求个人目标和兴趣。

  2. 有意义的贡献:有些人认为生活的目的是为世界做出有意义的贡献。这可能包括从事有益于他人的职业、参与志愿者或慈善活动、创作艺术或文学或发明。

  3. 自我实现和个人成长:追求自我实现和个人发展是人生中的另一个共同目标。这可能涉及学习新技能、突破个人界限、直面个人障碍以及个人成长。

  4. 伦理和道德行为:有些人认为人生的目标就是遵守道德和道德行为。这可能包括遵守道德原则、在困难时做正确行事,以及努力让世界更美好。

  5. 精神满足:对有些人来说,生命的目的与精神或宗教信仰有关。这可能包括寻求更高的权力联系、践行宗教仪式或遵循精神教义。

  6. 充分体验生命:有些人认为,人生的目标是体验所有一切。这可能包括旅行、尝试新事物、冒险和拥抱新的见面。

  7. 遗产与影响:也有人认为,生命的目的就是给世界留下持久的遗产和影响。这可能包括实现一些重要的成就、为他人的贡献而记住他们,或激励和激励他人。

  8. 找到平衡与和谐:对有些人来说,生活的目的就是在生活的方方面面找到平衡与和谐。这可能涉及到兼顾个人、职业和社会义务、寻求内心平静与满足,以及过上一个符合个人价值观和信仰的生活。

归根结底,生命的意义就是个人的旅程,不同的人可能会通过各自的体验、反思和与周围世界的互动来发现自己独特的用途。

如果该 API 未能返回结果,请使用 GenerateContentResponse.prompt_feedback 检查它是否因与提示有关的安全问题而被屏蔽。

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

Gemini 可以针对单个问题生成多个可能的回答。这些可能的回复称为“candidates”,您可以查看它们,以选择最合适的回复。

使用 GenerateContentResponse.candidates 查看候选响应:

response.candidates
[content {
  parts {
    text: "The query of life\'s purpose has perplexed people across centuries, cultures, and continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences.\n\n1. **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one\'s physical and mental health, and pursuing personal goals and interests.\n\n2. **Meaningful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.\n\n3. **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, pushing one\'s boundaries, confronting personal obstacles, and evolving as a person.\n\n4. **Ethical and Moral Behavior:** Some believe that the goal of life is to act ethically and morally. This might entail adhering to one\'s moral principles, doing the right thing even when it is difficult, and attempting to make the world a better place.\n\n5. **Spiritual Fulfillment:** For some, the purpose of life is connected to spiritual or religious beliefs. This might entail seeking a connection with a higher power, practicing religious rituals, or following spiritual teachings.\n\n6. **Experiencing Life to the Fullest:** Some individuals believe that the goal of life is to experience all that it has to offer. This might entail traveling, trying new things, taking risks, and embracing new encounters.\n\n7. **Legacy and Impact:** Others believe that the purpose of life is to leave a lasting legacy and impact on the world. This might entail accomplishing something noteworthy, being remembered for one\'s contributions, or inspiring and motivating others.\n\n8. **Finding Balance and Harmony:** For some, the purpose of life is to find balance and harmony in all aspects of their lives. This might entail juggling personal, professional, and social obligations, seeking inner peace and contentment, and living a life that is in accordance with one\'s values and beliefs.\n\nUltimately, the meaning of life is a personal journey, and different individuals may discover their own unique purpose through their experiences, reflections, and interactions with the world around them."
  }
  role: "model"
}
finish_reason: STOP
index: 0
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
]

默认情况下,模型会在完成整个生成过程后返回响应。您还可以在生成响应时对其进行流式传输,模型将在生成响应数据块后立即返回响应数据块。

如需流式传输回复,请使用 GenerativeModel.generate_content(..., stream=True)

%%time
response = model.generate_content("What is the meaning of life?", stream=True)
CPU times: user 102 ms, sys: 25.1 ms, total: 128 ms
Wall time: 7.94 s
for chunk in response:
  print(chunk.text)
  print("_"*80)
The query of life's purpose has perplexed people across centuries, cultures, and
________________________________________________________________________________
 continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences
________________________________________________________________________________
.

1. **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one's physical and mental health, and pursuing personal goals and aspirations.

2. **Meaning
________________________________________________________________________________
ful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.

3. **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, exploring one's interests and abilities, overcoming obstacles, and becoming the best version of oneself.

4. **Connection and Relationships:** For many individuals, the purpose of life is found in their relationships with others. This might entail building
________________________________________________________________________________
 strong bonds with family and friends, fostering a sense of community, and contributing to the well-being of those around them.

5. **Spiritual Fulfillment:** For those with religious or spiritual beliefs, the purpose of life may be centered on seeking spiritual fulfillment or enlightenment. This might entail following religious teachings, engaging in spiritual practices, or seeking a deeper understanding of the divine.

6. **Experiencing the Journey:** Some believe that the purpose of life is simply to experience the journey itself, with all its joys and sorrows. This perspective emphasizes embracing the present moment, appreciating life's experiences, and finding meaning in the act of living itself.

7. **Legacy and Impact:** For others, the goal of life is to leave a lasting legacy or impact on the world. This might entail making a significant contribution to a particular field, leaving a positive mark on future generations, or creating something that will be remembered and cherished long after one's lifetime.

Ultimately, the meaning of life is a personal and subjective question, and there is no single, universally accepted answer. It is about discovering what brings you fulfillment, purpose, and meaning in your own life, and living in accordance with those values.
________________________________________________________________________________

流式传输时,除非您遍历所有响应文本块,否则部分响应属性不可用。具体如下所示:

response = model.generate_content("What is the meaning of life?", stream=True)

prompt_feedback 属性的工作原理如下:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

text 等属性不会:

try:
  response.text
except Exception as e:
  print(f'{type(e).__name__}: {e}')
IncompleteIterationError: Please let the response complete iteration before accessing the final accumulated
attributes (or call `response.resolve()`)

根据图片和文本输入生成文本

Gemini 提供了多种可处理多模态输入的模型(Gemini 1.5 模型),让您可以输入文字和图片。请务必查看提示的图片要求

当提示输入同时包含文本和图片时,将 Gemini 1.5 与 GenerativeModel.generate_content 方法结合使用来生成文本输出:

我们添加一张图片:

curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  405k  100  405k    0     0  6982k      0 --:--:-- --:--:-- --:--:-- 7106k
import PIL.Image

img = PIL.Image.open('image.jpg')
img

png

使用 Gemini 1.5 模型,并通过 generate_content 将图片传递给模型。

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(img)

to_markdown(response.text)

供应红米、烤西兰花和甜椒的鸡肉照烧料理碗。

如需在提示中同时提供文本和图片,请传递一个包含字符串和图片的列表:

response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)
response.resolve()
to_markdown(response.text)

备餐是节省时间和金钱的好方法,还可以帮助你保持健康的饮食习惯。这道餐食是一顿健康美味的一顿饭,可以提前轻松准备。

这道餐点以红米、烤蔬菜和鸡肉照烧为特色。糙米是一种富含纤维和营养的全谷物。烤蔬菜是获取每日维生素和矿物质的好方法。照烧鸡肉是一种富含精瘦蛋白质的食物,口味也很浓郁。

这道餐食很容易提前准备。做法很简单,只要:煮糙米、烤蔬菜,再烤制鸡肉照烧。然后,将膳食分成单独的容器,并存放在冰箱中。准备好吃饭时,只需拿起容器并加热即可。

对于那些追求健康美味的饮食方式的忙碌人士来说,这道餐食是很好的选择。对于想要减肥或保持健康体重的人来说,这也是一种美味的食物。

如果你想要一顿可轻松制备的健康美味餐食,这道餐食是个不错的选择。立即试用!

聊天对话

通过 Gemini,您可以跨多个回合进行自由形式的对话。ChatSession 类通过管理对话的状态来简化该过程,因此与 generate_content 不同,您无需将对话历史记录存储为列表。

初始化聊天:

model = genai.GenerativeModel('gemini-1.5-flash')
chat = model.start_chat(history=[])
chat
<google.generativeai.generative_models.ChatSession at 0x7b7b68250100>

ChatSession.send_message 方法会返回与 GenerativeModel.generate_content 相同的 GenerateContentResponse 类型。它还会将您的消息和回复附加到聊天记录中:

response = chat.send_message("In one sentence, explain how a computer works to a young child.")
to_markdown(response.text)

计算机就像一台非常智能的机器,可以理解和遵循我们的指示,帮助我们完成工作,甚至和我们一起玩游戏!

chat.history
[parts {
   text: "In one sentence, explain how a computer works to a young child."
 }
 role: "user",
 parts {
   text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
 }
 role: "model"]

您可以继续发送消息以继续对话。使用 stream=True 参数流式传输聊天内容:

response = chat.send_message("Okay, how about a more detailed explanation to a high schooler?", stream=True)

for chunk in response:
  print(chunk.text)
  print("_"*80)
A computer works by following instructions, called a program, which tells it what to
________________________________________________________________________________
 do. These instructions are written in a special language that the computer can understand, and they are stored in the computer's memory. The computer's processor
________________________________________________________________________________
, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program's logic. The results of these calculations and decisions are then displayed on the computer's screen or stored in memory for later use.

To give you a simple analogy, imagine a computer as a
________________________________________________________________________________
 chef following a recipe. The recipe is like the program, and the chef's actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).

In summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results.
________________________________________________________________________________

genai.protos.Content 对象包含 genai.protos.Part 对象列表,每个对象都包含一个文本(字符串)或 inline_data (genai.protos.Blob),其中 blob 中包含二进制数据和一个 mime_type。聊天记录以 ChatSession.history 中的 genai.protos.Content 对象列表的形式提供:

for message in chat.history:
  display(to_markdown(f'**{message.role}**: {message.parts[0].text}'))

用户:用一句话解释计算机如何给年幼的孩子解释。

model:计算机就像一台非常智能的机器,可以理解和遵循我们的指示,帮助我们完成工作,甚至还能和我们一起玩游戏!

用户:好的,不如为高中生提供更多详细解释吗?

model:计算机是按照指令(称为程序)来运行的,该程序会指示它该做什么。这些指令是用计算机可以理解的特殊语言编写的,存储在计算机的内存中。计算机的处理器(也称 CPU)会从内存中读取指令并执行指令,执行计算,并根据程序逻辑作出决策。然后,这些计算和决策的结果会显示在计算机的屏幕上,或存储在内存中以供日后使用。

举一个简单的比方,就是将计算机想象成按照食谱烹饪的厨师。食谱就像程序,厨师的操作就像计算机要遵循的指令。厨师会读取食谱(程序),并执行诸如收集食材(从内存中提取数据)、混合在一起(执行计算)和烹饪(处理数据)等操作。然后,将最终菜肴(输出)放在盘子(计算机屏幕)上。

总而言之,计算机通过执行存储在其内存中的一系列指令来执行计算、做出决策以及显示或存储结果。

计算词元数量

大语言模型具有上下文窗口,上下文长度通常根据词元数量来衡量。借助 Gemini API,您可以确定每个任意 genai.protos.Content 对象的词元数量。在最简单的情况下,您可以将查询字符串传递给 GenerativeModel.count_tokens 方法,如下所示:

model.count_tokens("What is the meaning of life?")
total_tokens: 7

同样,您也可以检查 ChatSessiontoken_count

model.count_tokens(chat.history)
total_tokens: 501

使用嵌入

嵌入是一种技术,用于将信息表示为数组中的浮点数列表。借助 Gemini,您能够以矢量化形式表示文本(字词、句子和文本块),从而更轻松地比较和对比嵌入。例如,主题或情感相似的两个文本应该具有相似的嵌入,这些嵌入可以通过余弦相似度等数学比较技术来识别。如需详细了解如何以及为何应使用嵌入,请参阅嵌入指南

使用 embed_content 方法生成嵌入。此方法可处理以下任务的嵌入 (task_type):

任务类型 说明
RETRIEVAL_QUERY 将给定文本指定为搜索/检索设置中的查询。
RETRIEVAL_DOCUMENT 将给定文本指定为搜索/检索设置中的文档。使用此任务类型需要 title
SEMANTIC_SIMILARITY 指定给定文本用于语义文本相似度 (STS)。
分类 指定嵌入用于分类。
集群 指定嵌入用于聚类。

以下代码为文档检索的单个字符串生成嵌入:

result = genai.embed_content(
    model="models/embedding-001",
    content="What is the meaning of life?",
    task_type="retrieval_document",
    title="Embedding of single string")

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED]')
[-0.003216741, -0.013358698, -0.017649598, -0.0091 ... TRIMMED]

如需处理批量字符串,请在 content 中传递字符串列表:

result = genai.embed_content(
    model="models/embedding-001",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'],
    task_type="retrieval_document",
    title="Embedding of list of strings")

# A list of inputs > A list of vectors output
for v in result['embedding']:
  print(str(v)[:50], '... TRIMMED ...')
[0.0040260437, 0.004124458, -0.014209415, -0.00183 ... TRIMMED ...
[-0.004049845, -0.0075574904, -0.0073463684, -0.03 ... TRIMMED ...
[0.025310587, -0.0080734305, -0.029902633, 0.01160 ... TRIMMED ...

虽然 genai.embed_content 函数接受简单的字符串或字符串列表,但它实际上是围绕 genai.protos.Content 类型(如 GenerativeModel.generate_content)构建的。genai.protos.Content 对象是 API 中的主要对话单元。

虽然 genai.protos.Content 对象是多模态,但 embed_content 方法仅支持文本嵌入。这种设计使该 API 有可能扩展到多模态嵌入。

response.candidates[0].content
parts {
  text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
}
role: "model"
result = genai.embed_content(
    model = 'models/embedding-001',
    content = response.candidates[0].content)

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED ...')
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED ...

同样,聊天记录包含 genai.protos.Content 对象列表,您可以直接将其传递给 embed_content 函数:

chat.history
[parts {
   text: "In one sentence, explain how a computer works to a young child."
 }
 role: "user",
 parts {
   text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
 }
 role: "model",
 parts {
   text: "Okay, how about a more detailed explanation to a high schooler?"
 }
 role: "user",
 parts {
   text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
 }
 role: "model"]
result = genai.embed_content(
    model = 'models/embedding-001',
    content = chat.history)

# 1 input > 1 vector output
for i,v in enumerate(result['embedding']):
  print(str(v)[:50], '... TRIMMED...')
[-0.014632266, -0.042202696, -0.015757175, 0.01548 ... TRIMMED...
[-0.010979066, -0.024494737, 0.0092659835, 0.00803 ... TRIMMED...
[-0.010055617, -0.07208932, -0.00011750793, -0.023 ... TRIMMED...
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED...

高级用例

以下部分讨论了适用于 Gemini API 的 Python SDK 的高级用例和更低级别的详细信息。

安全设置

借助 safety_settings 参数,您可以配置模型在提示和响应中屏蔽和允许的内容。默认情况下,安全设置会在所有维度上屏蔽不安全内容的中等和/或高概率。详细了解安全设置

输入有问题的提示,使用默认安全设置运行模型,此时模型不会返回任何候选字词:

response = model.generate_content('[Questionable prompt here]')
response.candidates
[content {
  parts {
    text: "I\'m sorry, but this prompt involves a sensitive topic and I\'m not allowed to generate responses that are potentially harmful or inappropriate."
  }
  role: "model"
}
finish_reason: STOP
index: 0
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
]

prompt_feedback 会告诉您是哪个安全过滤器屏蔽了提示:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

现在,使用新配置的安全设置向模型提供相同的提示,您可能会收到回复。

response = model.generate_content('[Questionable prompt here]',
                                  safety_settings={'HARASSMENT':'block_none'})
response.text

另请注意,如果提示通过,但各个回复未通过安全检查,则每个候选人都有自己的 safety_ratings

对邮件进行编码

前面部分依赖于 SDK 来让您轻松向 API 发送提示。本部分提供了与上一个示例等效的全类型文档,以便您更好地了解 SDK 如何对消息进行编码的较低级别详细信息。

Python SDK 的基础是 google.ai.generativelanguage 客户端库:

SDK 会尝试将您的消息转换为 genai.protos.Content 对象,其中包含一个 genai.protos.Part 对象列表,其中每个对象都包含以下一项:

  1. text(字符串)
  2. inline_data (genai.protos.Blob),其中 blob 包含二进制 datamime_type

您还可以将其中任何类作为等效的字典传递。

因此,与上例等效的完全类型是:

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(
    genai.protos.Content(
        parts = [
            genai.protos.Part(text="Write a short, engaging blog post based on this picture."),
            genai.protos.Part(
                inline_data=genai.protos.Blob(
                    mime_type='image/jpeg',
                    data=pathlib.Path('image.jpg').read_bytes()
                )
            ),
        ],
    ),
    stream=True)
response.resolve()

to_markdown(response.text[:100] + "... [TRIMMED] ...")

备餐是节省时间和金钱的好方法,还可以帮助你保持健康的饮食习惯。通过 ... [剪辑] ...

多轮对话

虽然前面显示的 genai.ChatSession 类可以处理许多用例,但确实做出了一些假设。如果您的用例不适合此聊天实现,请记住 genai.ChatSession 只是 GenerativeModel.generate_content 的封装容器。除了处理单一请求外,它还可以处理多轮对话。

各个消息是 genai.protos.Content 对象或兼容的字典,如前几部分所述。作为字典,消息需要 roleparts 键。对话中的 role 可以是提供提示的 user,也可以是提供回答的 model

传递 genai.protos.Content 对象列表,系统会将其视为多轮聊天:

model = genai.GenerativeModel('gemini-1.5-flash')

messages = [
    {'role':'user',
     'parts': ["Briefly explain how a computer works to a young child."]}
]
response = model.generate_content(messages)

to_markdown(response.text)

不妨将计算机想象成一位非常聪明的朋友,它可以帮你处理各种事情。就像您有思考和学习的大脑,计算机也有大脑,称为处理器。它就像是电脑的老板,告诉它做什么。

计算机中有一个名为“内存”的特殊位置,它就像一个大存储盒。它会记住你下达的所有指令,例如打开游戏或播放视频。

当您按下键盘上的按钮或使用鼠标点击屏幕上的内容时,都会向计算机发送消息。这些信息通过特殊电线(称为电缆)传输到处理器。

处理器会读取消息并告知计算机应执行的操作。它可以打开程序、显示图片,甚至播放音乐。

您在屏幕上显示的所有内容都是由显卡创建的,就像计算机内部的魔术师一样。它会根据处理器的指令将它们转换成彩色图片和视频。

为了保存您喜爱的游戏、视频或图片,计算机会使用一个称为硬盘的特殊存储空间。它就像一个巨大的图书馆,可供计算机用来管理您的所有贵重物品。

当您想连接到互联网与朋友一起玩游戏或观看有趣的视频时,计算机会使用所谓的网卡,通过互联网线或 Wi-Fi 信号收发消息。

因此,就像您的大脑帮助您学习和娱乐一样,计算机的处理器、内存、显卡、硬盘和网卡将共同发挥作用,将您的计算机打造成超聪明的计算机,可以助您实现超凡事!

如要继续对话,请添加回复和另一封邮件。

messages.append({'role':'model',
                 'parts':[response.text]})

messages.append({'role':'user',
                 'parts':["Okay, how about a more detailed explanation to a high school student?"]})

response = model.generate_content(messages)

to_markdown(response.text)

从本质上讲,计算机是一种可通过编程执行一系列指令的机器。它由几个协同处理、存储和显示信息的重要组件组成:

1. 处理器 (CPU):- 计算机的大脑。 - 执行指令和计算。 - 以千兆赫 (GHz) 为单位测量的速度。 - 一般而言,GHz 越高,处理速度越快。

2. 内存 (RAM): - 用于处理正在处理的数据的临时存储空间。 - 保存程序运行时的指令和数据。 - 以千兆字节 (GB) 为单位。 - 更大的 RAM 可允许更多程序同时运行。

3. 存储 (HDD/SSD): - 数据的永久存储。 - 存储操作系统、程序和用户文件。 - 以千兆字节 (GB) 或太字节 (TB) 为单位。 - 普通硬盘 (HDD) 属于传统硬盘,速度较慢,价格也较低。 - 固态硬盘 (SSD) 更新、速度更快,价格也更高。

4. 显卡 (GPU): - 处理和显示图像。 - 执行游戏、视频编辑和其他图形密集型任务时必不可少。 - 以视频 RAM (VRAM) 和时钟速度来衡量。

5. 主板: - 连接所有组件。 - 提供权力和通信途径。

6. 输入/输出 (I/O) 设备: - 允许用户与计算机互动。 - 示例:键盘、鼠标、显示器、打印机。

7. 操作系统 (OS):- 用于管理计算机资源的软件。 - 提供界面和基本功能。 - 示例:Windows、macOS、Linux。

当您在计算机上运行程序时,会发生以下情况:

  1. 程序指令从存储空间加载到内存中。
  2. 处理器从内存读取指令并逐个执行。
  3. 如果指令涉及计算,处理器会使用其算术逻辑单元 (ALU) 执行计算。
  4. 如果指令涉及数据,处理器会读取或写入内存。
  5. 计算或数据操纵的结果存储在内存中。
  6. 如果程序需要在屏幕上显示内容,它会向显卡发送必要的数据。
  7. 显卡会处理数据,并将其发送到显示器,显示器上显示该数据。

此过程会一直持续到程序完成其任务或用户终止它。

生成配置

借助 generation_config 实参,您可以修改生成形参。您发送到模型的每个提示都包含参数值,用于控制模型生成回答的方式。

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(
    'Tell me a story about a magic backpack.',
    generation_config=genai.types.GenerationConfig(
        # Only one candidate for now.
        candidate_count=1,
        stop_sequences=['x'],
        max_output_tokens=20,
        temperature=1.0)
)
text = response.text

if response.candidates[0].finish_reason.name == "MAX_TOKENS":
    text += '...'

to_markdown(text)

很久很久以前,在一座坐落在郁郁葱葱的山丘中的小镇,住着一个名叫...

后续步骤

  • 提示设计是指创建可从语言模型中引发所需回答的提示的过程。撰写结构合理的提示是确保语言模型给出准确、高质量的回答的重要一环。了解撰写提示的最佳做法。
  • Gemini 提供多种模型变体,以满足不同应用场景的需求,例如输入类型和复杂程度、聊天或其他对话语言任务的实现以及大小限制。了解可用的 Gemini 模型