Gemini API 使用入门:Python

在 Google AI 上查看 在 Google Colab 中运行 查看 GitHub 上的源代码

本快速入门演示了如何将 Python SDK 用于 Gemini API,以便访问 Google 的 Gemini 大语言模型。在本快速入门中,您将学习如何完成以下操作:

  1. 设置您的开发环境和 API 访问权限以使用 Gemini。
  2. 根据文本输入生成文本回复。
  3. 根据多模态输入(文本和图像)生成文本回复。
  4. 使用 Gemini 进行多轮对话(聊天)。
  5. 将嵌入用于大型语言模型。

前提条件

您可以在 Google Colab 中运行本快速入门,直接在浏览器中运行此笔记本,不需要额外配置环境。

或者,如需在本地完成本快速入门,请确保您的开发环境满足以下要求:

  • Python 3.9 及更高版本
  • 安装了 jupyter,用于运行笔记本。

初始设置

安装 Python SDK

适用于 Gemini API 的 Python SDK 包含在 google-generativeai 软件包中。使用 pip 安装依赖项:

pip install -q -U google-generativeai

导入软件包

导入必要的软件包。

import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
# Used to securely store your API key
from google.colab import userdata

设置您的 API 密钥

您必须先获取 API 密钥,然后才能使用 Gemini API。如果您还没有密钥,请在 Google AI Studio 中一键创建。

获取 API 密钥

在 Colab 中,将密钥添加到 Secret 管理器中左侧面板中的“🔑?”下。将其命名为 GOOGLE_API_KEY

获得 API 密钥后,将其传递给 SDK。可以通过以下两种方法实现此目的:

  • 将密钥放在 GOOGLE_API_KEY 环境变量中(SDK 会自动从该变量中获取密钥)。
  • 将密钥传递给 genai.configure(api_key=...)
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

列出模型

现在,您可以调用 Gemini API 了。使用 list_models 查看可用的 Gemini 模型:

  • gemini-pro:针对纯文本提示进行了优化。
  • gemini-pro-vision:针对文字和图片提示进行了优化。
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

genai 软件包还支持 PaLM 系列模型,但只有 Gemini 模型支持 generateContent 方法的通用多模态功能。

根据文本输入生成文本

对于纯文本提示,请使用 gemini-pro 模型:

model = genai.GenerativeModel('gemini-pro')

generate_content 方法可以处理各种用例,包括多轮聊天和多模态输入,具体取决于底层模型支持哪些用例。可用的模型仅支持输入文本和图片,输出文本。

在最简单的情况下,您可以将提示字符串传递给 GenerativeModel.generate_content 方法:

%%time
response = model.generate_content("What is the meaning of life?")
CPU times: user 110 ms, sys: 12.3 ms, total: 123 ms
Wall time: 8.25 s

在简单情况下,您只需要 response.text 访问器。如需显示设置了格式的 Markdown 文本,请使用 to_markdown 函数:

to_markdown(response.text)

几个世纪、不同文化和各大洲的人们对生活目标提出的疑问都一直困扰着他们。虽然没有公认的回应,但许多想法已经提出了,并且回应通常取决于个人的想法、信仰和生活经历。

  1. 幸福与幸福:许多人认为生活的目标是获得个人幸福和幸福感。这可能涉及到寻找可带来快乐的事业、建立重要的联系、照顾个人的身心健康以及追求个人目标和兴趣。

  2. 有意义的贡献:有些人认为,生活的目的就是为世界做出有意义的贡献。这可能包括从事对他人有益的职业、参与志愿活动或慈善活动、从事艺术或文学创作或发明。

  3. 自我实现和个人成长:追求自我实现和个人发展是生活中的另一个常见目标。这可能涉及学习新技能、不断突破极限、克服个人障碍,以及不断进步。

  4. 道德和道德行为:有些人认为人生目标就是合乎道德和道德行事。这可能涉及到遵守一个道德原则,在困难时期做正确的事,并努力让世界变得更美好。

  5. 精神实现:对某些人来说,生活目的与精神信仰或宗教信仰有关。这可能需要获得具有更高权力的联系、践行宗教仪式或遵循精神教导。

  6. 充分体验生活:有些人认为生活的目标是体验一切中提供的东西。包括旅行、尝试新事物、承担风险和拥抱新事物。

  7. 遗留和影响:另一些人则认为,人生的目的就是给世界留下持久的遗产和影响。这可能涉及完成一些值得注意的事情、因他人的贡献而被记住,或者鼓舞和激励他人。

  8. 寻求平衡和和谐:对有些人来说,生活的目的就是在生活的方方面面找到平衡和和谐。这可能涉及兼顾个人、职业和社会义务,寻求内在的平静与满足,以及符合个人价值观和信仰的生活。

归根结底,生活的意义是一场个人旅程,不同的人可能会通过经历、反思和与周围世界的互动,发现自己独特的目的。

如果 API 未能返回结果,请使用 GenerateContentResponse.prompt_feedback 查看是否是因提示存在安全问题而被屏蔽。

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

Gemini 可以针对一条提示生成多个可能的回答。这些可能的响应称为 candidates,您可以查看这些响应,然后选择最合适的响应作为响应。

使用 GenerateContentResponse.candidates 查看候选响应:

response.candidates
[content {
  parts {
    text: "The query of life\'s purpose has perplexed people across centuries, cultures, and continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences.\n\n1. **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one\'s physical and mental health, and pursuing personal goals and interests.\n\n2. **Meaningful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.\n\n3. **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, pushing one\'s boundaries, confronting personal obstacles, and evolving as a person.\n\n4. **Ethical and Moral Behavior:** Some believe that the goal of life is to act ethically and morally. This might entail adhering to one\'s moral principles, doing the right thing even when it is difficult, and attempting to make the world a better place.\n\n5. **Spiritual Fulfillment:** For some, the purpose of life is connected to spiritual or religious beliefs. This might entail seeking a connection with a higher power, practicing religious rituals, or following spiritual teachings.\n\n6. **Experiencing Life to the Fullest:** Some individuals believe that the goal of life is to experience all that it has to offer. This might entail traveling, trying new things, taking risks, and embracing new encounters.\n\n7. **Legacy and Impact:** Others believe that the purpose of life is to leave a lasting legacy and impact on the world. This might entail accomplishing something noteworthy, being remembered for one\'s contributions, or inspiring and motivating others.\n\n8. **Finding Balance and Harmony:** For some, the purpose of life is to find balance and harmony in all aspects of their lives. This might entail juggling personal, professional, and social obligations, seeking inner peace and contentment, and living a life that is in accordance with one\'s values and beliefs.\n\nUltimately, the meaning of life is a personal journey, and different individuals may discover their own unique purpose through their experiences, reflections, and interactions with the world around them."
  }
  role: "model"
}
finish_reason: STOP
index: 0
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
]

默认情况下,模型会在完成整个生成过程后返回响应。您也可以在生成响应时流式传输响应,模型将在生成响应块后立即返回这些块。

如需流式传输响应,请使用 GenerativeModel.generate_content(..., stream=True)

%%time
response = model.generate_content("What is the meaning of life?", stream=True)
CPU times: user 102 ms, sys: 25.1 ms, total: 128 ms
Wall time: 7.94 s
for chunk in response:
  print(chunk.text)
  print("_"*80)
The query of life's purpose has perplexed people across centuries, cultures, and
________________________________________________________________________________
 continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences
________________________________________________________________________________
.

1. **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one's physical and mental health, and pursuing personal goals and aspirations.

2. **Meaning
________________________________________________________________________________
ful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.

3. **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, exploring one's interests and abilities, overcoming obstacles, and becoming the best version of oneself.

4. **Connection and Relationships:** For many individuals, the purpose of life is found in their relationships with others. This might entail building
________________________________________________________________________________
 strong bonds with family and friends, fostering a sense of community, and contributing to the well-being of those around them.

5. **Spiritual Fulfillment:** For those with religious or spiritual beliefs, the purpose of life may be centered on seeking spiritual fulfillment or enlightenment. This might entail following religious teachings, engaging in spiritual practices, or seeking a deeper understanding of the divine.

6. **Experiencing the Journey:** Some believe that the purpose of life is simply to experience the journey itself, with all its joys and sorrows. This perspective emphasizes embracing the present moment, appreciating life's experiences, and finding meaning in the act of living itself.

7. **Legacy and Impact:** For others, the goal of life is to leave a lasting legacy or impact on the world. This might entail making a significant contribution to a particular field, leaving a positive mark on future generations, or creating something that will be remembered and cherished long after one's lifetime.

Ultimately, the meaning of life is a personal and subjective question, and there is no single, universally accepted answer. It is about discovering what brings you fulfillment, purpose, and meaning in your own life, and living in accordance with those values.
________________________________________________________________________________

在流式传输时,某些响应属性在您迭代所有响应块之前不可用。如下所示:

response = model.generate_content("What is the meaning of life?", stream=True)

prompt_feedback 属性的工作原理如下:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

text 等属性不会:

try:
  response.text
except Exception as e:
  print(f'{type(e).__name__}: {e}')
IncompleteIterationError: Please let the response complete iteration before accessing the final accumulated
attributes (or call `response.resolve()`)

根据图片和文本输入生成文本

Gemini 提供了一个可接受文本、图片及输入的多模态模型 (gemini-pro-vision)。GenerativeModel.generate_content API 用于处理多模态提示并返回文本输出。

添加一张图片:

curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  405k  100  405k    0     0  6982k      0 --:--:-- --:--:-- --:--:-- 7106k
import PIL.Image

img = PIL.Image.open('image.jpg')
img

png

使用 gemini-pro-vision 模型,并通过 generate_content 将图片传递给模型。

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content(img)

to_markdown(response.text)

照烧鸡肉配料碗配棕米、烤西兰花和甜椒。

如需在提示中同时提供文本和图片,请传递包含字符串和图片的列表:

response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)
response.resolve()
to_markdown(response.text)

备餐是一种节省时间和金钱的绝佳方法,还可以帮助您更加健康地饮食。这道餐点是可以提前轻松烹制的健康美味佳肴的典范。

这道餐点包括糙米、烤蔬菜和照烧鸡肉。褐米是一种含有高纤维和营养的全谷物。烤蔬菜是获取您每日摄入维生素和矿物质的好方法。照烧鸡肉是一种瘦蛋白,口味丰富。

这顿饭很容易提前准备。只需烹饪糙米,烤蔬菜,然后烹制鸡肉照烧即可。然后,将餐食放入单独的容器中,并存放在冰箱中。准备好食材后,只需拿起一个容器并对其进行加热即可。

这道餐点对于追求健康美味饮食的忙碌人来说是一个不错的选择。对于想要减重或保持健康体重的人来说,它也是很好的餐点。

如果您正在寻找可以提前准备的健康美味的餐点,那么此餐点是一个不错的选择。快来试试吧!

聊天对话

Gemini 允许您跨多个回合进行自由形式的对话。ChatSession 类通过管理对话的状态来简化此过程,因此与 generate_content 不同,您无需将对话历史记录存储为列表。

初始化聊天:

model = genai.GenerativeModel('gemini-pro')
chat = model.start_chat(history=[])
chat
<google.generativeai.generative_models.ChatSession at 0x7b7b68250100>

ChatSession.send_message 方法返回的 GenerateContentResponse 类型与 GenerativeModel.generate_content 相同。它还会将您的消息和回复附加到聊天记录中:

response = chat.send_message("In one sentence, explain how a computer works to a young child.")
to_markdown(response.text)

计算机就像是一台非常智能的机器,它可以理解并遵守我们的指示,帮助我们完成工作,甚至可以和我们一起玩游戏!

chat.history
[parts {
   text: "In one sentence, explain how a computer works to a young child."
 }
 role: "user",
 parts {
   text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
 }
 role: "model"]

您可以继续发送消息,从而继续对话。使用 stream=True 参数流式传输聊天内容:

response = chat.send_message("Okay, how about a more detailed explanation to a high schooler?", stream=True)

for chunk in response:
  print(chunk.text)
  print("_"*80)
A computer works by following instructions, called a program, which tells it what to
________________________________________________________________________________
 do. These instructions are written in a special language that the computer can understand, and they are stored in the computer's memory. The computer's processor
________________________________________________________________________________
, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program's logic. The results of these calculations and decisions are then displayed on the computer's screen or stored in memory for later use.

To give you a simple analogy, imagine a computer as a
________________________________________________________________________________
 chef following a recipe. The recipe is like the program, and the chef's actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).

In summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results.
________________________________________________________________________________

glm.Content 对象包含 glm.Part 对象列表,每个对象包含一个文本(字符串)或 inline_data (glm.Blob),其中 blob 包含二进制数据和 mime_type。聊天记录以 ChatSession.history 中的 glm.Content 对象列表的形式提供:

for message in chat.history:
  display(to_markdown(f'**{message.role}**: {message.parts[0].text}'))

user:用一句话说明年幼的孩子对计算机的工作原理。

model:计算机就像是一台非常智能的机器,它可以理解并遵守我们的指示,帮助我们完成工作,甚至可以和我们一起玩游戏!

用户:好的,为高中生提供更详细的讲解呢?

model:计算机按照指令运行,这些指令称为程序,用于指示它执行什么操作。这些指令是用计算机可以理解的特殊语言编写的,存储在计算机的内存中。计算机的处理器(即 CPU)从内存中读取并执行指令,并根据程序的逻辑执行计算并做出决策。然后,这些计算和决策的结果会显示在电脑屏幕上,或储存在内存中以供日后使用。

举个简单的类比,将计算机想象成一位按食谱烹饪的厨师。食谱就像程序,厨师的操作类似于计算机遵循的说明。厨师读取食谱(程序)并执行操作,例如收集食材(从内存中获取数据)、将食材混合在一起(执行计算)和烹饪(处理数据)。然后,最终的菜肴(输出)会呈现在盘子上(计算机屏幕)。

总而言之,计算机的工作原理是执行存储在内存中的一系列指令,以执行计算、做出决策以及显示或存储结果。

计算词元数

大语言模型具有上下文窗口,上下文长度通常根据词元数量来衡量。借助 Gemini API,您可以确定每个 glm.Content 对象的令牌数量。在最简单的情况下,您可以将查询字符串传递给 GenerativeModel.count_tokens 方法,如下所示:

model.count_tokens("What is the meaning of life?")
total_tokens: 7

同样,您也可以检查 token_count 中的 ChatSession

model.count_tokens(chat.history)
total_tokens: 501

使用嵌入

嵌入是一种用于将信息表示为数组中的浮点数列表的技术。借助 Gemini,您能以矢量化形式表示文本(字词、句子和文本块),从而更轻松地比较和对比嵌入。例如,具有相同主题或情感的两个文本应该具有相似的嵌入,这可以通过余弦相似等数学比较技术确定。如需详细了解如何以及为何应使用嵌入,请参阅嵌入指南

使用 embed_content 方法生成嵌入。该方法处理以下任务的嵌入 (task_type):

任务类型 说明
RETRIEVAL_QUERY 指定给定文本是搜索/检索设置中的查询。
RETRIEVAL_DOCUMENT 指定给定文本是搜索/检索设置中的文档。使用此任务类型需要 title
SEMANTIC_SIMILARITY 指定给定文本用于语义文本相似度 (STS)。
分类 指定嵌入用于分类。
集群 指定嵌入用于聚类。

以下代码为用于文档检索的单个字符串生成嵌入:

result = genai.embed_content(
    model="models/embedding-001",
    content="What is the meaning of life?",
    task_type="retrieval_document",
    title="Embedding of single string")

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED]')
[-0.003216741, -0.013358698, -0.017649598, -0.0091 ... TRIMMED]

如需处理批量字符串,请在 content 中传递字符串列表:

result = genai.embed_content(
    model="models/embedding-001",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'],
    task_type="retrieval_document",
    title="Embedding of list of strings")

# A list of inputs > A list of vectors output
for v in result['embedding']:
  print(str(v)[:50], '... TRIMMED ...')
[0.0040260437, 0.004124458, -0.014209415, -0.00183 ... TRIMMED ...
[-0.004049845, -0.0075574904, -0.0073463684, -0.03 ... TRIMMED ...
[0.025310587, -0.0080734305, -0.029902633, 0.01160 ... TRIMMED ...

虽然 genai.embed_content 函数接受简单的字符串或字符串列表,但它实际上是围绕 glm.Content 类型(例如 GenerativeModel.generate_content)构建的。glm.Content 对象是 API 中对话的主要单位。

虽然 glm.Content 对象是多模态,但 embed_content 方法仅支持文本嵌入。这种设计使该 API 有可能扩展到多模态嵌入。

response.candidates[0].content
parts {
  text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
}
role: "model"
result = genai.embed_content(
    model = 'models/embedding-001',
    content = response.candidates[0].content)

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED ...')
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED ...

同样,聊天记录包含 glm.Content 对象列表,您可以直接将其传递给 embed_content 函数:

chat.history
[parts {
   text: "In one sentence, explain how a computer works to a young child."
 }
 role: "user",
 parts {
   text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
 }
 role: "model",
 parts {
   text: "Okay, how about a more detailed explanation to a high schooler?"
 }
 role: "user",
 parts {
   text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
 }
 role: "model"]
result = genai.embed_content(
    model = 'models/embedding-001',
    content = chat.history)

# 1 input > 1 vector output
for i,v in enumerate(result['embedding']):
  print(str(v)[:50], '... TRIMMED...')
[-0.014632266, -0.042202696, -0.015757175, 0.01548 ... TRIMMED...
[-0.010979066, -0.024494737, 0.0092659835, 0.00803 ... TRIMMED...
[-0.010055617, -0.07208932, -0.00011750793, -0.023 ... TRIMMED...
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED...

高级用例

以下部分讨论了 Gemini API 的 Python SDK 的高级用例和较低级别的详细信息。

安全设置

通过 safety_settings 参数,您可以配置模型在提示和响应中屏蔽和允许的内容。默认情况下,安全设置会屏蔽在所有维度上可能是中等和/或高可能性为不安全内容的内容。详细了解安全设置

输入一个有问题的提示,并使用默认安全设置运行模型,这样模型就不会返回任何候选版本:

response = model.generate_content('[Questionable prompt here]')
response.candidates
[content {
  parts {
    text: "I\'m sorry, but this prompt involves a sensitive topic and I\'m not allowed to generate responses that are potentially harmful or inappropriate."
  }
  role: "model"
}
finish_reason: STOP
index: 0
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
]

prompt_feedback会告诉您哪个安全过滤器屏蔽了提示:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

现在,使用新配置的安全设置向模型提供相同的提示,您可能会收到回复。

response = model.generate_content('[Questionable prompt here]',
                                  safety_settings={'HARASSMENT':'block_none'})
response.text

另请注意,如果提示通过,但个别响应未通过安全检查,则每个候选者都有自己的 safety_ratings

对消息进行编码

前面的部分依赖于 SDK,以便您轻松向 API 发送提示。本部分提供了一个与上一个示例等效的完全类型,以便您更好地了解有关 SDK 如何对消息进行编码的较低级别详细信息。

Python SDK 底层是 google.ai.generativelanguage 客户端库:

import google.ai.generativelanguage as glm

SDK 会尝试将您的消息转换为 glm.Content 对象,其中包含 glm.Part 对象列表,其中每个对象包含以下任一对象:

  1. 一个 text(字符串)
  2. inline_data (glm.Blob),其中 blob 包含二进制 datamime_type

您还可以将这些类中的任何一个作为等效字典传递。

因此,与上一个示例等效的完全类型化函数:

model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content(
    glm.Content(
        parts = [
            glm.Part(text="Write a short, engaging blog post based on this picture."),
            glm.Part(
                inline_data=glm.Blob(
                    mime_type='image/jpeg',
                    data=pathlib.Path('image.jpg').read_bytes()
                )
            ),
        ],
    ),
    stream=True)
response.resolve()

to_markdown(response.text[:100] + "... [TRIMMED] ...")

备餐是一种节省时间和金钱的绝佳方法,还可以帮助您更加健康地饮食。拍摄者 ... [TRIMMED] ...

多轮对话

虽然前面显示的 genai.ChatSession 类可以处理许多用例,但它确实做了一些假设。如果您的用例不适合此聊天实现,请记住 genai.ChatSession 只是 GenerativeModel.generate_content 的封装容器。除了单个请求之外,它还可以处理多轮对话。

各个消息是 glm.Content 对象或兼容的字典,如前几部分所述。消息需要 roleparts 键作为字典。对话中的 role 可以是提供提示的 user,也可以是提供回复的 model

传递 glm.Content 对象列表,系统就会将其视为多轮聊天:

model = genai.GenerativeModel('gemini-pro')

messages = [
    {'role':'user',
     'parts': ["Briefly explain how a computer works to a young child."]}
]
response = model.generate_content(messages)

to_markdown(response.text)

将计算机想象成一个非常聪明的朋友,它可以帮您处理很多事务。就像人有大脑思考和学习一样,计算机也有大脑,称为处理器。它就好比是电脑的“老板”,在向它指示要执行的操作。

计算机内部有一个叫做内存的特殊空间,它就像一个大存储盒。它会记住您指示它执行的所有操作,例如打开游戏或播放视频。

当您按下键盘上的按钮或用鼠标点击屏幕上的内容时,都会向计算机发送消息。这些消息会通过特殊电线(称为电缆)传输到处理器。

处理器会读取消息并告知计算机该做什么。处理器可打开程序、显示图片,甚至为您播放音乐。

您在屏幕上看到的所有内容都是由显卡创建的,显卡就像计算机中的一位魔术师。它接受处理器的指令,然后将它们转换成彩色照片和视频。

为了保存您喜爱的游戏、视频或图片,计算机会使用一个称为硬盘的特殊存储空间。它就像一个巨大的图书馆,里面的计算机可以妥善保管您的所有珍贵物品。

如果你想连接到互联网,与朋友玩游戏或观看有趣的视频,计算机会使用名为网卡的设备,通过网线或 Wi-Fi 信号收发消息。

因此,就像大脑能帮助你学习和娱乐一样,计算机的处理器、内存、显卡、硬盘和网卡齐心协力,让计算机成为一个超智能的小伙伴,可以造福你!

要继续对话,请添加回复和其他消息。

messages.append({'role':'model',
                 'parts':[response.text]})

messages.append({'role':'user',
                 'parts':["Okay, how about a more detailed explanation to a high school student?"]})

response = model.generate_content(messages)

to_markdown(response.text)

从本质上讲,计算机是可以通过编程来执行一系列指令的机器。它由几个基本组件组成,这些组件协同工作以处理、存储和显示信息:

1. 处理器 (CPU): - 计算机的大脑。 - 执行指令并执行计算。 - 测量速度,以千兆赫 (GHz) 为单位。 - GHz 越高,处理速度通常越快。

2. 内存 (RAM): - 用于处理数据的临时存储。 - 在程序运行时保存指令和数据。 - 以千兆字节 (GB) 为单位。 - 更大的 RAM 可以同时运行更多的程序。

3. 存储 (HDD/SSD): - 永久性数据存储。 - 存储操作系统、程序和用户文件。 - 以千兆字节 (GB) 或太字节 (TB) 为单位。 - 普通硬盘 (HDD) 较为传统,速度更慢,且价格更低。 - 固态硬盘 (SSD) 较新、速度更快且价格更高。

4. 显卡 (GPU): - 处理和显示图片。 - 对游戏、视频编辑和其他图形密集型任务至关重要。 - 按视频 RAM (VRAM) 和时钟速度进行衡量。

5. 主板: - 连接所有组件。 - 提供电源和通信途径。

6. 输入/输出 (I/O) 设备: - 允许用户与计算机互动。 - 示例:键盘、鼠标、显示器、打印机。

7. 操作系统 (OS): - 用于管理计算机资源的软件。 - 提供界面和基本功能。 - 示例:Windows、macOS、Linux。

当您在计算机上运行程序时,会出现以下情况:

  1. 程序指令从存储加载到内存中。
  2. 处理器从内存中读取指令并逐个执行。
  3. 如果指令涉及计算,则处理器会使用其算术逻辑单元 (ALU) 执行计算。
  4. 如果指令涉及数据,则处理器会对内存执行读取或写入操作。
  5. 计算或数据操纵的结果存储在内存中。
  6. 如果程序需要在屏幕上显示内容,就会将必要的数据发送到显卡。
  7. 显卡会处理数据,并将其发送至显示器。

此过程将持续到程序完成其任务或用户将其终止。

生成配置

借助 generation_config 参数,您可以修改生成参数。您向模型发送的每个提示都包含控制模型如何生成回答的参数值。

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(
    'Tell me a story about a magic backpack.',
    generation_config=genai.types.GenerationConfig(
        # Only one candidate for now.
        candidate_count=1,
        stop_sequences=['x'],
        max_output_tokens=20,
        temperature=1.0)
)
text = response.text

if response.candidates[0].finish_reason.name == "MAX_TOKENS":
    text += '...'

to_markdown(text)

很久很久以前,一个坐落在郁郁葱葱的山丘中的小镇里住着一个名叫...

后续步骤

  • 提示设计是指创建能从语言模型中引发所需回复的提示的过程。编写结构良好的提示对于确保语言模型给出准确、高质量的回复至关重要。了解提示撰写的最佳实践。
  • Gemini 提供了多个模型变体来满足不同用例的需求,例如输入类型和复杂度、聊天或其他对话框语言任务的实现,以及大小限制。了解可用的 Gemini 型号
  • Gemini 提供了用于申请提高速率限制的选项。Gemini-Pro 模型的速率限制为每分钟 60 次请求 (RPM)。