使用嵌入功能搜尋文件

前往 ai.google.dev 查看 在 Google Colab 中執行 前往 GitHub 查看原始碼

總覽

這個範例說明如何使用 Gemini API 建立嵌入項目,以便執行文件搜尋。您將使用 Python 用戶端程式庫建構字詞嵌入,以便比較搜尋字串或問題和文件內容。

在這個教學課程中,您將使用嵌入功能在一組文件中執行文件搜尋,以詢問與 Google Car 有關的問題。

先備知識

您可以在 Google Colab 中執行本快速入門導覽課程。

如要在自己的開發環境完成本快速入門導覽課程,請確保您的環境符合下列需求:

  • Python 3.9 以上版本
  • 安裝 jupyter 以執行筆記本。

設定

首先,下載並安裝 Gemini API Python 程式庫。

pip install -U -q google.generativeai
import textwrap
import numpy as np
import pandas as pd

import google.generativeai as genai

# Used to securely store your API key
from google.colab import userdata

from IPython.display import Markdown

取得 API 金鑰

您必須先取得 API 金鑰,才能使用 Gemini API。如果您尚未建立金鑰,請在 Google AI Studio 中按一下滑鼠即可建立金鑰。

取得 API 金鑰

在 Colab 中,將金鑰新增至密鑰管理員下方的「🔑?」。輸入名稱 API_KEY

取得 API 金鑰後,請將其傳遞至 SDK。操作方式有以下兩種:

  • 將金鑰放入 GOOGLE_API_KEY 環境變數中 (SDK 會自動從中取得)。
  • 將金鑰傳遞至 genai.configure(api_key=...)
# Or use `os.getenv('API_KEY')` to fetch an environment variable.
API_KEY=userdata.get('API_KEY')

genai.configure(api_key=API_KEY)
for m in genai.list_models():
  if 'embedContent' in m.supported_generation_methods:
    print(m.name)
models/embedding-001
models/embedding-001

產生嵌入功能

本節將說明如何使用 Gemini API 的嵌入功能,為一段文字產生嵌入。

將 API 變更為使用模型嵌入功能 001 的嵌入

新的嵌入模型「embedding-001」其中有新的工作類型參數和選用標題 (僅適用於 task_type=RETRIEVAL_DOCUMENT)。

這些新參數僅適用於最新的嵌入模型。工作類型如下:

工作類型 說明
RETRIEVAL_QUERY 指定指定文字是搜尋/擷取設定中的查詢。
RETRIEVAL_DOCUMENT 指定文字是搜尋/擷取設定中的文件。
SEMANTIC_SIMILARITY 指定指定文字將用於語意文字相似度 (STS)。
分類 指定要將嵌入用於分類。
叢集 指定嵌入將用於分群。
title = "The next generation of AI for developers and Google Workspace"
sample_text = ("Title: The next generation of AI for developers and Google Workspace"
    "\n"
    "Full article:\n"
    "\n"
    "Gemini API & Google AI Studio: An approachable way to explore and prototype with generative AI applications")

model = 'models/embedding-001'
embedding = genai.embed_content(model=model,
                                content=sample_text,
                                task_type="retrieval_document",
                                title=title)

print(embedding)
{'embedding': [0.034585103, -0.044509504, -0.027291223, 0.0072681927, 0.061689284, 0.03362112, 0.028627988, 0.022681564, 0.04958079, 0.07274552, 0.011150464, 0.04200501, -0.029782884, -0.0041767005, 0.05074771, -0.056339227, 0.051204756, 0.04734613, -0.022025354, 0.025162602, 0.046016376, -0.003416976, -0.024010269, -0.044340927, -0.01520864, -0.013577372, -0.009918958, -0.028144406, -0.00024770075, 0.031201784, -0.072506696, 0.022366496, -0.032672316, -0.0025522006, -0.0019957912, -0.023193765, -0.020633291, -0.014031609, -0.00071676675, -0.0073200124, 0.014770645, -0.09390713, -0.017846372, 0.032825496, 0.017616265, -0.046674345, 0.03469292, 0.03386835, 0.0028274113, -0.07737739, 0.023789782, 0.025950644, 0.06952142, -0.029875675, -0.018693604, 0.007266584, -0.0067282487, 0.000802912, 0.020609016, 0.012406181, -0.018825717, 0.051171597, -0.0080359895, 0.008457639, 0.01197146, -0.080320396, -0.040698495, 0.0018266322, 0.042915005, 0.021464704, 0.022519842, 0.0059912056, 0.050887667, -0.04566639, -0.012651369, -0.14023173, -0.0274054, 0.04492792, 0.014709818, 0.037258334, -0.021294944, -0.041852854, -0.069640376, -0.030281356, -0.0070775123, 0.019886682, -0.050179508, -0.03839318, -0.014652514, 0.03370254, -0.02803748, -0.059206057, 0.055928297, -0.034912255, -0.007784368, 0.098106734, -0.06873356, -0.052850258, -0.011798939, -0.030071719, -0.026038093, 0.016752971, -0.020916667, 0.007365556, 0.017650642, 0.006677715, -0.036498126, 0.02110524, -0.05625146, 0.043038886, -0.06515849, -0.019825866, -0.010379261, -0.037537806, 0.017674655, -0.042821705, 0.014320703, 0.036735073, 0.011445211, 0.027352763, -0.0028090556, 0.009011982, 0.024146665, 0.002215841, -0.07397819, 0.008714616, -0.03377923, 0.034349587, 0.022429721, 0.052665956, -0.0021583177, -0.040462274, -0.019938014, 0.030099798, 0.009743918, 0.009111553, 0.026379738, -0.015910586, 0.010171418, 0.023996552, -0.031924065, 0.024775924, 0.014129728, 0.008913726, -0.010156162, 0.05407575, -0.080851324, 0.022005167, 0.012674272, -0.017213775, -0.009514327, 0.03276702, -0.06795425, -0.0004906647, 0.036379207, 0.034329377, -0.037122324, 0.05565231, -0.0038797501, 0.009620726, 0.050033607, 0.0084967585, 0.050638147, 0.00490447, 0.006675041, -0.04295331, -0.006490465, 0.010016808, -0.011493882, 0.023702862, 0.029825455, 0.03514081, -0.013388401, -0.05283049, 0.00019729362, -0.05095579, -0.031205554, 0.0045187837, -0.0066217924, -0.007931168, -0.0030577614, -0.016934164, 0.04188085, 0.050768845, 0.009407336, -0.02838461, 0.079967216, -0.038705315, -0.06723827, 0.015558192, -0.043977134, -0.022096274, -0.0053875325, -0.022216668, 0.013843675, 0.04506347, 0.051535256, 0.033484843, 0.044276737, -0.01299742, 0.021727907, 0.06798745, 0.038896713, 0.0023941514, 0.00815586, 0.029679826, 0.109524906, 0.012102062, -0.058510404, 0.03252702, -0.050666984, -0.006376317, 0.026164565, 0.008671174, 0.05052107, -0.027606683, 0.005126455, -0.0029112308, -0.015136989, -0.026336055, -0.031090762, 0.01717387, -0.03679281, -0.008987327, -0.0015111889, 0.0951955, -0.047756936, 0.03215895, 0.0029104433, -0.026967648, 0.015690766, 0.072443135, 0.039804243, 0.019212538, 0.08688796, -0.006074699, 0.015716698, 0.01919827, 0.030602958, 0.008902454, -0.046521842, 0.01976686, 0.051571846, 0.022742877, -0.04307271, -0.016526582, -0.03293306, 0.056195326, 0.0034229455, 0.022546848, -0.03803692, -0.051709678, 0.006613695, -0.0014020284, -0.036669895, -0.001721542, -0.08655083, -0.052215993, -0.032110028, 0.02565277, 0.04519586, -0.049954705, 0.0012014605, -0.037857044, -0.017148033, -0.026822135, 0.031737078, 0.028569039, -0.022907747, 0.024690803, -0.029206393, -0.032036074, 0.039650604, 0.021772616, -0.021436188, 0.045968816, -0.010048652, 0.030124044, 0.03935015, -0.04809066, 0.023686275, 0.02167442, 0.044297505, -0.073465124, -0.030082388, 0.017143175, -0.03342189, -0.0330694, -0.0122910105, -0.051963367, -0.058639623, -0.008972449, -0.022521269, -0.022892935, -0.035436112, 0.0034948539, -0.005295366, 0.05993406, 0.027561562, -0.010693112, 0.0009929353, -0.08425568, -0.02769792, -0.061596338, 0.036154557, -0.037945468, -0.03125497, -0.030945951, 0.04039234, 0.06636523, 0.016889103, -0.003046984, -0.011618148, 0.0011459244, 0.08574449, 0.036592126, -0.051252075, 0.013240978, -0.004678898, 0.0855428, -0.009402003, 0.028451374, -0.020148227, 0.0028894239, -0.02822095, 0.0315999, -0.057231728, 0.0004925584, -0.019411521, 0.021964703, 0.009169671, 0.01635917, -0.035817493, 0.052273333, -0.0009408905, 0.018396556, -0.041456044, 0.019532038, -0.0034153357, -0.034743972, 0.0027093922, 0.00044865624, 0.0023108325, -0.04501131, 0.05044232, -0.034571823, -0.039061558, 0.008809692, 0.068560965, 0.015274846, 0.023746625, 0.043649375, -0.028320875, -0.009765932, -0.009430268, -0.055888545, 0.047219332, 0.023080856, 0.064999744, -0.039562706, 0.0501819, 0.046483964, -0.009398194, -0.0013862611, 0.014837316, 0.045558825, 0.016926765, 0.03220044, 0.003780334, 0.040371794, 0.00057833333, -0.04805651, 0.01602842, -0.005916167, -0.0020399855, 0.036410075, -0.09505558, -0.021768136, 0.021421269, 0.024159726, -0.013026249, -0.023113504, 0.02459358, 0.01643742, -0.0104496805, 0.033115752, 0.047128692, 0.05519812, -0.013151745, 0.03202098, 0.0014973703, -0.009810199, 0.09950044, 0.03161514, 0.022533545, 0.028800217, 0.011425177, -0.06616128, 0.018490529, -0.024615118, -0.01714155, -0.036444064, -0.024078121, 6.236274e-05, -0.025733253, -0.012052791, -0.0032004463, -0.007022415, -0.07943268, -0.010401283, 0.014510383, -0.017218677, 0.056253612, -0.028017681, -0.06288073, -0.0010291388, 0.042233694, -0.017423663, -0.014384363, 0.008450004, -0.006025767, 0.00068278343, 0.043332722, -0.048530027, -0.10272868, 0.016439026, -0.0043581687, 0.014065921, 0.015250153, 0.0035983857, 0.024789328, 0.052941743, 0.0023809967, -0.0041563907, -0.02350335, -0.05152261, -0.026173577, 0.025396436, -0.020441707, 0.0052804356, 0.017074147, -0.023429962, 0.028667469, -0.056579348, -0.045674913, -0.050122924, -0.029717976, 0.011392094, 0.01918305, -0.090463236, 0.011211278, -0.058831867, -0.027594091, -0.08303421, -0.014075257, -0.013071177, 0.0050326143, 0.024727797, -0.004616583, -0.007565293, 0.0043535405, -0.05543633, -0.022187654, -0.026209656, 0.064442314, -0.0066669765, -0.002169784, -0.019930722, 4.8227314e-05, -0.0015547068, -0.0057820054, -0.08949447, -0.0115463175, -0.026195917, -0.008628893, -0.0017553791, -0.08588936, 0.008043627, -0.040522296, -0.006249298, -0.040554754, 0.021548215, 0.049422685, -0.008809529, -0.024933426, -0.040077355, 0.038274486, 0.029687686, -0.02959238, 0.0426982, 0.029072417, 0.049369767, -0.018109215, -0.041628513, -0.005594527, 0.026668772, -0.027726736, 0.037220005, 0.058132544, 0.01863369, -0.04707943, -0.0006536238, -0.012569923, 0.01520091, 0.05510794, -0.05035494, 0.036055118, -0.020710817, -0.0051193447, -0.042542584, 0.0020174137, 0.0014168078, -0.001090868, -0.034683146, 0.06309216, -0.05918888, 0.017469395, 0.025378557, 0.046790935, 0.008669848, 0.07935556, -0.016844809, -0.08596125, -0.037868172, 0.0057407417, -0.04262457, 0.0036744277, -0.04798243, 0.010448024, 0.005311227, -0.025689157, 0.051566023, -0.053452246, -0.033347856, -0.014070289, -0.001457106, 0.056622982, -0.037253298, -0.0010763579, 0.025846632, -0.017852046, -0.035092466, 0.0293208, 0.035001587, -0.002458465, -0.0032884434, -0.011247537, -0.03308368, 0.027546775, -0.0197189, -0.019373588, 0.012695445, -0.00846602, 0.0006254506, 0.022446852, -0.021224227, -0.016343568, -0.008488644, 0.009065775, -0.0038449552, -0.036945608, 0.035750583, 0.0021798566, 0.007781292, 0.07929656, -0.017595762, -0.020934578, -0.03354823, 0.04495828, -0.008365722, -0.040300835, 0.0006642716, 0.0568309, 0.016416628, 0.0722137, -0.01774583, -0.0492021, -0.0020490142, -0.049469862, 0.043543257, 0.04398881, 0.025031362, -0.0063477345, 0.062346347, -0.040481493, -0.02257938, 0.009280532, 0.010731656, 0.02230327, 0.002849086, -0.05473455, 0.047677275, -0.02363733, 0.029837264, -0.020835804, -0.017142115, 0.006764067, -0.01684698, 0.021653073, 0.040238675, -0.018611673, -0.04561582, 0.038430944, -0.02677326, 0.007663415, 0.06948015, -0.0012032362, 0.008699309, 0.011357286, 0.021917833, 0.00018160013, -0.076829135, 0.0023802964, -0.023293033, -0.03534673, -0.042327877, -0.0210994, 0.042625647, -0.014360755, -0.0066886684, 0.03561479, 0.047778953, 0.037118394, 0.041420408, 0.052272875, 0.039208084, -0.033506226, -0.00651392, 0.062439967, 0.03669325, 0.042872086, 0.066822834, -0.0068043126, -0.021161819, -0.050757803, 0.005068388, -0.0027463334, 0.013415453, -0.033819556, -0.046399325, -0.03287996, -0.019854786, -0.0070042396, -0.00042829785, -0.036087025, -0.00650163, 0.0008774728, -0.10458266, -0.061043933, 0.016721264, 0.0002953045, -0.0053018867, 0.012741255, 0.0050292304, 0.024298942, 0.0033208653, -0.0629338, -0.0005545099, 0.04004244, -0.03548021, -0.02479493, 0.035712432, -0.017079322, -0.030503469, 0.0019789268, -0.028768733, -0.054890547, -0.08133776, -0.03006806, -0.016685534, -0.073403284, 0.05233739, 0.033545494, 0.0035976092, 0.040786255, 0.056786384, 0.013151219, 0.042795595, 0.009594162, 0.00945792, 0.024018744, -0.045365516, -0.050492898, 0.038503986, 0.012790262, 0.0142914, 0.014998696, 0.0071202153, -0.0038871064, 0.010770397, 0.016789515, -0.041323792, 0.010311674, -0.009053558, 0.034749016, 0.005213924, -0.041184388, -0.0033388685, 0.04279652, 0.04068113, -0.024129236, -0.0059263078, 0.027970677, -0.024706231, 0.02846046, -0.0011169978, -0.059880134, 0.02713591, -0.0027713599, 0.040187914, 0.035978075, -0.06281134, -0.08345513, -0.006073032, -0.02095529, -0.018988023, -0.035680003, 0.04972727, -0.009011115, 0.054317664, 0.005172075, 0.031131523, -0.00069823023, 0.0108121475, -0.06091403, 0.049459387, -0.007036548, -0.014955144, -0.02104843, 0.035405546, 0.043375615, -0.042294793, -0.025417345, -0.015245514, 0.023398506, 0.002263163, -0.0071430253, 0.043531902, -0.03357511, -0.09097121, -0.04729407, -0.013593756, 0.023449646, 0.039015424, 0.027113337, -0.05169247, -0.016909705, -0.0057588373, -0.009955609, -0.05562937, -0.052671663, 0.003173363, -0.0022836009, 0.036742315, 0.047324646, -0.033285677, 0.012819869, -0.01939692, -0.0047737034, -0.011794656, -0.045633573, -0.0013346534, 0.016130142, -0.066292875, 0.029637614, 0.057662483, -0.035122138, 0.068166904]}

建立嵌入資料庫

以下是用來建構嵌入資料庫的三個範例文字。您將使用 Gemini API 為每份文件建立嵌入項目。將資料集轉換為 DataFrame,提升圖表的呈現效果。

DOCUMENT1 = {
    "title": "Operating the Climate Control System",
    "content": "Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."}
DOCUMENT2 = {
    "title": "Touchscreen",
    "content": "Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the \"Navigation\" icon to get directions to your destination or touch the \"Music\" icon to play your favorite songs."}
DOCUMENT3 = {
    "title": "Shifting Gears",
    "content": "Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."}

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

將字典內容整理成 DataFrame,以改善圖表呈現結果。

df = pd.DataFrame(documents)
df.columns = ['Title', 'Text']
df

取得這些文字內文的嵌入。將這項資訊新增至 DataFrame。

# Get the embeddings of each text and add to an embeddings column in the dataframe
def embed_fn(title, text):
  return genai.embed_content(model=model,
                             content=text,
                             task_type="retrieval_document",
                             title=title)["embedding"]

df['Embeddings'] = df.apply(lambda row: embed_fn(row['Title'], row['Text']), axis=1)
df

透過問與答功能搜尋文件

嵌入已產生,讓我們建立問答系統來搜尋這些文件。您將詢問超參數調整的相關問題、建立問題嵌入項目,並與 DataFrame 中的嵌入集合比較。

題目的嵌入會是向量 (浮點值清單),並將採用內積的積積物與文件向量進行比較。API 傳回的這個向量已經過正規化。內積代表兩個向量之間的方向相似度。

點積點的值可介於 -1 到 1 之間 (含首尾)。如果兩個向量之間的內積為 1,則向量的方向相同。如果內積值為 0,則這些向量彼此相鄰或不相關。最後,如果點積為 -1,則向量的指向方向相反,彼此不相似。

請注意,使用新的嵌入模型 (embedding-001) 時,請將工作類型指定為使用者查詢的 QUERY,嵌入文件文字時則將工作類型指定為 DOCUMENT

工作類型 說明
RETRIEVAL_QUERY 指定指定文字是搜尋/擷取設定中的查詢。
RETRIEVAL_DOCUMENT 指定文字是搜尋/擷取設定中的文件。
query = "How do you shift gears in the Google car?"
model = 'models/embedding-001'

request = genai.embed_content(model=model,
                              content=query,
                              task_type="retrieval_query")

使用 find_best_passage 函式計算內積,然後依據從大到小的點積值排序 DataFrame,從資料庫中擷取相關段落。

def find_best_passage(query, dataframe):
  """
  Compute the distances between the query and each document in the dataframe
  using the dot product.
  """
  query_embedding = genai.embed_content(model=model,
                                        content=query,
                                        task_type="retrieval_query")
  dot_products = np.dot(np.stack(dataframe['Embeddings']), query_embedding["embedding"])
  idx = np.argmax(dot_products)
  return dataframe.iloc[idx]['Text'] # Return text from index with max value

檢視資料庫中最相關的文件:

passage = find_best_passage(query, df)
passage
'Shifting Gears  Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions.'

問答申請

讓我們嘗試使用文字生成 API 建立 Q &一個系統。請在下方輸入您的自訂資料,建立簡單的問題和回答範例。你仍將以內心產品做為相似度指標。

def make_prompt(query, relevant_passage):
  escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")
  prompt = textwrap.dedent("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
  Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
  However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
  strike a friendly and converstional tone. \
  If the passage is irrelevant to the answer, you may ignore it.
  QUESTION: '{query}'
  PASSAGE: '{relevant_passage}'

    ANSWER:
  """).format(query=query, relevant_passage=escaped)

  return prompt
prompt = make_prompt(query, passage)
print(prompt)
You are a helpful and informative bot that answers questions using text from the reference passage included below.   Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.   However, you are talking to a non-technical audience, so be sure to break down complicated concepts and   strike a friendly and converstional tone.   If the passage is irrelevant to the answer, you may ignore it.
  QUESTION: 'How do you shift gears in the Google car?'
  PASSAGE: 'Shifting Gears  Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions.'

    ANSWER:

選擇任一 Gemini 內容生成模型,查看問題解答。

for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)
models/gemini-pro
models/gemini-ultra
model = genai.GenerativeModel('gemini-1.5-pro-latest')
answer = model.generate_content(prompt)
Markdown(answer.text)

在提供的段落未包含如何在 Google 車上更換裝備的資訊,因此我們無法透過這個來源回答你的問題。

後續步驟

如要進一步瞭解如何使用嵌入功能,請參閱下列其他教學課程: