您現在可以透過 Google 搜尋進行地面資訊驗證了！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

開始使用語意擷取

在 ai.google.dev 上查看

試用 Colab 筆記本

在 GitHub 中查看筆記本

總覽

大型語言模型 (LLM) 可以學習新能力，而無須直接訓練。不過，大型語言模型往往會「訓練」對於不曾訓練過的問題提供回覆。這部分是因為 LLM 在訓練後不會偵測事件。追蹤 LLM 的回應來源也非常困難。對於可靠且可擴充的應用程式而言，LLM 提供的回覆必須以事實為依據，並能引用資訊來源。

克服這些限制的常見方法稱為「檢索增強生成」(RAG)，這個方法會透過資訊檢索 (IR) 機制，將從外部知識庫擷取的相關資料，加入傳送至 LLM 的提示。知識庫可以是您自己的文件、資料庫或 API 文本集合。

本筆記將逐步引導您完成工作流程，藉由使用外部文字語料庫擴充大型語言模型的知識，並使用生成式語言 API 的語意檢索器和屬性問答 (AQA) API 執行語意資訊檢索，以便回答問題。

設定

匯入生成式語言 API

# Install the Client library (Semantic Retriever is only supported for versions >0.4.0)
pip install -U google.ai.generativelanguage

驗證

Semantic Retriever API 可讓您針對自有資料執行語意搜尋。由於這是您的資料，因此需要的存取權控管比 API 金鑰嚴格。使用服務帳戶或使用者憑證，透過 OAuth 進行驗證。

本快速入門導覽課程採用簡化的驗證方法，適用於測試環境，而且服務帳戶設定通常較容易上手。如果是正式環境，請先瞭解驗證和授權，再選擇應用程式適用的存取憑證。

使用服務帳戶設定 OAuth

如要使用服務帳戶設定 OAuth，請按照下列步驟操作：

啟用 Generative Language API。

按照說明文件建立服務帳戶。
- 建立服務帳戶後，請產生服務帳戶金鑰。

請使用左側欄的檔案圖示，然後點選上傳圖示，上傳服務帳戶檔案，如以下螢幕截圖所示。
- 將上傳的檔案重新命名為 service_account_key.json，或變更下方程式碼中的變數 service_account_file_name。

pip install -U google-auth-oauthlib

service_account_file_name = 'service_account_key.json'

from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(service_account_file_name)

scoped_credentials = credentials.with_scopes(
    ['https://www.googleapis.com/auth/cloud-platform', 'https://www.googleapis.com/auth/generative-language.retriever'])

使用服務帳戶憑證初始化用戶端程式庫。

import google.ai.generativelanguage as glm
generative_service_client = glm.GenerativeServiceClient(credentials=scoped_credentials)
retriever_service_client = glm.RetrieverServiceClient(credentials=scoped_credentials)
permission_service_client = glm.PermissionServiceClient(credentials=scoped_credentials)

建立語料庫

Semantic Retriever API 可讓您為每個專案定義最多 5 個自訂文字語料庫。您可以在定義語料時指定下列其中一個欄位：

name：Corpus 資源名稱 (ID)。長度上限為 40 個英數字元。如果 name 在建立時為空白，系統會產生專屬名稱，長度上限為 40 個半形字元，其中前置字串來自 display_name，後置字串則為 12 個半形字元的隨機字串。
display_name：使用者可理解的 Corpus 顯示名稱。長度上限為 512 個半形字元，包括英數字元、空格和破折號。

example_corpus = glm.Corpus(display_name="Google for Developers Blog")
create_corpus_request = glm.CreateCorpusRequest(corpus=example_corpus)

# Make the request
create_corpus_response = retriever_service_client.create_corpus(create_corpus_request)

# Set the `corpus_resource_name` for subsequent sections.
corpus_resource_name = create_corpus_response.name
print(create_corpus_response)

name: "corpora/google-for-developers-blog-dqrtz8rs0jg"
display_name: "Google for Developers Blog"
create_time {
  seconds: 1713497533
  nanos: 587977000
}
update_time {
  seconds: 1713497533
  nanos: 587977000
}

取得建立的語料庫

使用 GetCorpusRequest 方法，透過程式輔助方式存取您在上方建立的 Corpus。name 參數的值是指 Corpus 的完整資源名稱，且上方儲存格中的值為 corpus_resource_name。預期格式為 corpora/corpus-123。

get_corpus_request = glm.GetCorpusRequest(name=corpus_resource_name)

# Make the request
get_corpus_response = retriever_service_client.get_corpus(get_corpus_request)

# Print the response
print(get_corpus_response)

建立文件

Corpus 最多可包含 10,000 個 Document。定義文件時，您可以指定下列任一欄位：

name：Document 資源名稱 (ID)。長度上限為 40 個半形字元 (只能使用英數字元或破折號)。ID 的開頭或結尾不得為連字號。如果建立時名稱為空白，系統會從 display_name 衍生出專屬名稱，並加上 12 個字元的隨機字串後置字元。
display_name：使用者可讀取的顯示名稱。長度上限為 512 個半形字元，包括英數字元、空格和破折號。

Document 也支援最多 20 個使用者指定的 custom_metadata 欄位，以鍵/值組合指定。自訂中繼資料可以是字串、字串清單或數字。請注意，字串清單最多可以支援 10 個值，而數字值在 API 中會以浮點數表示。

# Create a document with a custom display name.
example_document = glm.Document(display_name="Introducing Project IDX, An Experiment to Improve Full-stack, Multiplatform App Development")

# Add metadata.
# Metadata also supports numeric values not specified here
document_metadata = [
    glm.CustomMetadata(key="url", string_value="https://developers.googleblog.com/2023/08/introducing-project-idx-experiment-to-improve-full-stack-multiplatform-app-development.html")]
example_document.custom_metadata.extend(document_metadata)

# Make the request
# corpus_resource_name is a variable set in the "Create a corpus" section.
create_document_request = glm.CreateDocumentRequest(parent=corpus_resource_name, document=example_document)
create_document_response = retriever_service_client.create_document(create_document_request)

# Set the `document_resource_name` for subsequent sections.
document_resource_name = create_document_response.name
print(create_document_response)

取得已建立的文件

使用 GetDocumentRequest 方法，透過程式輔助方式存取您在上方建立的文件。name 參數的值是指文件的完整資源名稱，並在上述儲存格中設為 document_resource_name。預期格式為 corpora/corpus-123/documents/document-123。

get_document_request = glm.GetDocumentRequest(name=document_resource_name)

# Make the request
# document_resource_name is a variable set in the "Create a document" section.
get_document_response = retriever_service_client.get_document(get_document_request)

# Print the response
print(get_document_response)

擷取及分割文件

為改善向量資料庫在語意擷取期間傳回的內容相關性，請在擷取文件時將大型文件分割成較小的部分或區塊。

Chunk 是 Document 的子部分，會視為向量表示法和儲存空間的獨立單元。Chunk 最多可包含 2043 個符記。Corpus 最多可包含 100 萬個 Chunk。

與 Document 類似，Chunks 也支援最多 20 個使用者指定的 custom_metadata 欄位，並以鍵/值組合指定。自訂中繼資料可以是字串、字串清單或數字。請注意，字串清單最多可支援 10 個值，且數值會在 API 中以浮點數表示。

本指南使用 Google 的 Open Source HtmlChunker。

其他可用的切塊器包括 LangChain 或 LlamaIndex。

透過 HtmlChunker 擷取 HTML 並分割

!pip install google-labs-html-chunker

from google_labs_html_chunker.html_chunker import HtmlChunker

from urllib.request import urlopen

取得網站的 HTML DOM。系統可以直接讀取 HTML，但最好在轉譯 HTML 後加入以加入 JavaScript 插入的 HTML，例如 document.documentElement.innerHTML。

with(urlopen("https://developers.googleblog.com/2023/08/introducing-project-idx-experiment-to-improve-full-stack-multiplatform-app-development.html")) as f:
  html = f.read().decode("utf-8")

將文字文件分解為段落，並根據這些段落建立 Chunk。這個步驟會建立 Chunk 物件本身，而下一節會將這些物件上傳至 Semantic Retriever API。

# Chunk the file using HtmlChunker
chunker = HtmlChunker(
    max_words_per_aggregate_passage=200,
    greedily_aggregate_sibling_nodes=True,
    html_tags_to_exclude={"noscript", "script", "style"},
)
passages = chunker.chunk(html)
print(passages)


# Create `Chunk` entities.
chunks = []
for passage in passages:
    chunk = glm.Chunk(data={'string_value': passage})
    # Optionally, you can add metadata to a chunk
    chunk.custom_metadata.append(glm.CustomMetadata(key="tags",
                                                    string_list_value=glm.StringList(
                                                        values=["Google For Developers", "Project IDX", "Blog", "Announcement"])))
    chunk.custom_metadata.append(glm.CustomMetadata(key="chunking_strategy",
                                                    string_value="greedily_aggregate_sibling_nodes"))
    chunk.custom_metadata.append(glm.CustomMetadata(key = "publish_date",
                                                    numeric_value = 20230808))
    chunks.append(chunk)
print(chunks)

批次建立區塊

以批次建立區塊。每個批次要求最多可指定 100 個區塊。

使用 CreateChunk() 建立單一區塊。

# Option 1: Use HtmlChunker in the section above.
# `chunks` is the variable set from the section above.
create_chunk_requests = []
for chunk in chunks:
  create_chunk_requests.append(glm.CreateChunkRequest(parent=document_resource_name, chunk=chunk))

# Make the request
request = glm.BatchCreateChunksRequest(parent=document_resource_name, requests=create_chunk_requests)
response = retriever_service_client.batch_create_chunks(request)
print(response)

您也可以不使用 HtmlChunker 建立區塊。

# Add up to 100 CreateChunk requests per batch request.
# document_resource_name is a variable set in the "Create a document" section.
chunks = []
chunk_1 = glm.Chunk(data={'string_value': "Chunks support user specified metadata."})
chunk_1.custom_metadata.append(glm.CustomMetadata(key="section",
                                                  string_value="Custom metadata filters"))
chunk_2 = glm.Chunk(data={'string_value': "The maximum number of metadata supported is 20"})
chunk_2.custom_metadata.append(glm.CustomMetadata(key = "num_keys",
                                                  numeric_value = 20))
chunks = [chunk_1, chunk_2]
create_chunk_requests = []
for chunk in chunks:
  create_chunk_requests.append(glm.CreateChunkRequest(parent=document_resource_name, chunk=chunk))

# Make the request
request = glm.BatchCreateChunksRequest(parent=document_resource_name, requests=create_chunk_requests)
response = retriever_service_client.batch_create_chunks(request)
print(response)

列出 `Chunk` 並取得狀態

使用 ListChunksRequest 方法，將所有可用的 Chunk 做為分頁清單，每頁最多 100 個 Chunk，並依 Chunk.create_time 的升冪順序排序。如果未指定限制，系統最多會傳回 10 個 Chunk。

提供 ListChunksRequest 回應中傳回的 next_page_token 做為引數，以便擷取下一個網頁。請注意，進行分頁時，提供至 ListChunks 的所有其他參數都必須與提供網頁權杖的呼叫相符。

所有 Chunk 都會傳回 state。您可以使用這個方法，在查詢 Corpus 之前先檢查 Chunks 的狀態。Chunk 狀態包括 - UNSPECIFIED、PENDING_PROCESSING、ACTIVE 和 FAILED。您只能查詢 ACTIVE 個「Chunk」。

# Make the request
request = glm.ListChunksRequest(parent=document_resource_name)
list_chunks_response = retriever_service_client.list_chunks(request)
for index, chunks in enumerate(list_chunks_response.chunks):
  print(f'\nChunk # {index + 1}')
  print(f'Resource Name: {chunks.name}')
  # Only ACTIVE chunks can be queried.
  print(f'State: {glm.Chunk.State(chunks.state).name}')

擷取其他文件

透過 HtmlChunker 新增另一個 Document，然後新增篩選器。

# Create a document with a custom display name.
example_document = glm.Document(display_name="How it’s Made: Interacting with Gemini through multimodal prompting")

# Add document metadata.
# Metadata also supports numeric values not specified here
document_metadata = [
    glm.CustomMetadata(key="url", string_value="https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html")]
example_document.custom_metadata.extend(document_metadata)

# Make the CreateDocument request
# corpus_resource_name is a variable set in the "Create a corpus" section.
create_document_request = glm.CreateDocumentRequest(parent=corpus_resource_name, document=example_document)
create_document_response = retriever_service_client.create_document(create_document_request)

# Set the `document_resource_name` for subsequent sections.
document_resource_name = create_document_response.name
print(create_document_response)

# Chunks - add another webpage from Google for Developers
with(urlopen("https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html")) as f:
  html = f.read().decode("utf-8")

# Chunk the file using HtmlChunker
chunker = HtmlChunker(
    max_words_per_aggregate_passage=100,
    greedily_aggregate_sibling_nodes=False,
)
passages = chunker.chunk(html)

# Create `Chunk` entities.
chunks = []
for passage in passages:
    chunk = glm.Chunk(data={'string_value': passage})
    chunk.custom_metadata.append(glm.CustomMetadata(key="tags",
                                                    string_list_value=glm.StringList(
                                                        values=["Google For Developers", "Gemini API", "Blog", "Announcement"])))
    chunk.custom_metadata.append(glm.CustomMetadata(key="chunking_strategy",
                                                    string_value="no_aggregate_sibling_nodes"))
    chunk.custom_metadata.append(glm.CustomMetadata(key = "publish_date",
                                                    numeric_value = 20231206))
    chunks.append(chunk)

# Make the request
create_chunk_requests = []
for chunk in chunks:
  create_chunk_requests.append(glm.CreateChunkRequest(parent=document_resource_name, chunk=chunk))
request = glm.BatchCreateChunksRequest(parent=document_resource_name, requests=create_chunk_requests)
response = retriever_service_client.batch_create_chunks(request)
print(response)

查詢語料庫

使用 QueryCorpusRequest 方法執行語意搜尋，取得相關的段落。

results_count：指定要傳回的經文數量。上限為 100。如果未指定，API 最多會傳回 10 個 Chunk。
metadata_filters：依「chunk_metadata」或「document_metadata」篩選。每個 MetadataFilter 都必須對應至專屬金鑰。多個 MetadataFilter 物件會透過邏輯 AND 合併。類似的中繼資料篩選條件會以邏輯 OR 連結。以下提供一些例子：

(year >= 2020 OR year < 2010) AND (genre = drama OR genre = action)

metadata_filter = [
  {
    key = "document.custom_metadata.year"
    conditions = [
      {int_value = 2020, operation = GREATER_EQUAL},
      {int_value = 2010, operation = LESS}]
  },
  {
    key = "document.custom_metadata.genre"
    conditions = [
      {string_value = "drama", operation = EQUAL},
      {string_value = "action", operation = EQUAL} }]
  }]

請注意，只有數值值可在相同鍵中支援「AND」。字串值僅支援相同鍵的「OR」。

("Google for Developers" in tags) and (20230314 > publish_date)

metadata_filter = [
 {
    key = "chunk.custom_metadata.tags"
    conditions = [
    {string_value = 'Google for Developers', operation = INCLUDES},
  },
  {
    key = "chunk.custom_metadata.publish_date"
    conditions = [
    {numeric_value = 20230314, operation = GREATER_EQUAL}]
  }]

user_query = "What is the purpose of Project IDX?"
results_count = 5

# Add metadata filters for both chunk and document.
chunk_metadata_filter = glm.MetadataFilter(key='chunk.custom_metadata.tags',
                                           conditions=[glm.Condition(
                                              string_value='Google For Developers',
                                              operation=glm.Condition.Operator.INCLUDES)])

# Make the request
# corpus_resource_name is a variable set in the "Create a corpus" section.
request = glm.QueryCorpusRequest(name=corpus_resource_name,
                                 query=user_query,
                                 results_count=results_count,
                                 metadata_filters=[chunk_metadata_filter])
query_corpus_response = retriever_service_client.query_corpus(request)
print(query_corpus_response)

已標示問題回答

使用 GenerateAnswer 方法可對文件、語料庫或一組段落執行歸因問題回答。

屬性問題解答 (AQA) 是指根據特定情境回答問題，並提供歸因，同時盡量減少幻覺。

在需要 AQA 的情況下，GenerateAnswer 比使用未調整的 LLM 更具優勢：

底層模型經過訓練，只會根據提供的上下文回覆相關答案。
這項功能會找出歸因 (提供的背景資訊中，有助於產生答案的片段)。作者資訊可讓使用者確認答案。
系統會針對特定 (問題、背景資訊) 組合估算 answerable_probability，讓您進一步根據傳回答案的準確性和正確性，調整產品行為。

`answerable_probability` 和「不知道」的問題

在某些情況下，最佳回覆其實是「我不知道」。舉例來說，如果提供的背景資訊中沒有問題的答案，系統就會將該問題視為「無法回答」。

AQA 模型非常擅長辨識這類情況。甚至可以區分答案的正確程度和無法回答的程度。

不過，GenerateAnswer API 可以透過以下方式，讓您最終做出決策權：

一律嘗試傳回有根據的答案，即使該答案不太可能有根據且正確也一樣。
傳回值 answerable_probability - 模型根據基準化和正確答案的可能性預估值。

answerable_probability 偏低可能與下列一或多個因素有關：

模型不確定答案是否正確。
模型不確定答案是否以引文為依據，而是可能來自世界知識。例如：question="1+1=?", passages=["2+2=4”] → answer=2, answerable_probability=0.02
模型提供的相關資訊無法完全回答問題。範例：question="Is it available in my size?, passages=["Available in sizes 5-11"] → answer="Yes it is available in sizes 5-11", answerable_probability=0.03"
GenerateAnswerRequest 中未提出正確格式的問題。

由於 answerable_probability 值偏低，表示 GenerateAnswerResponse.answer 可能有誤或缺乏依據，因此強烈建議您進一步檢查 answerable_probability，以便進一步處理回應。

當 answerable_probability 偏低時，部分客戶可能會想：

向使用者顯示類似「無法回答該問題」的訊息。
改為使用通用 LLM，根據世界知識回答問題。這類備用的門檻和性質取決於個人用途。answerable_probability 值小於或等於 0.5 是良好的起始門檻。

AQA 實用提示

如需完整的 API 規格，請參閱 GenerateAnswerRequest API 參考資料。

段落長度：建議每段落最多 300 個符記。
段落排序：
- 如果您提供 GenerateAnswerRequest.inline_passages，則段落應按與查詢的相關性遞減排序。如果模型的結構定義長度超過上限，就會省略最後一個 (關聯性最低) 的段落。
- 如果提供「GenerateAnswerRequest.semantic_retriever」，系統會自動為您排序關聯性。
限制：AQA 模型專門用於回答問題。如要用於其他用途 (例如創意寫作、摘要等)，請透過 GenerateContent 呼叫通用模型。
- 即時通訊：如果使用者輸入的問題是可在特定情境下回答的問題，品質確保團隊可以回答即時通訊查詢。但如果使用者輸入內容可能屬於任何類型，那麼通用模型可能會是較佳選擇。
溫度：
- 一般來說，建議使用相對較低 (約 0.2) 的溫度，以便準確執行 AQA。
- 如果用途需要確定的輸出內容，請將 temperature 設為 0。

user_query = "What is the purpose of Project IDX?"
answer_style = "ABSTRACTIVE" # Or VERBOSE, EXTRACTIVE
MODEL_NAME = "models/aqa"

# Make the request
# corpus_resource_name is a variable set in the "Create a corpus" section.
content = glm.Content(parts=[glm.Part(text=user_query)])
retriever_config = glm.SemanticRetrieverConfig(source=corpus_resource_name, query=content)
req = glm.GenerateAnswerRequest(model=MODEL_NAME,
                                contents=[content],
                                semantic_retriever=retriever_config,
                                answer_style=answer_style)
aqa_response = generative_service_client.generate_answer(req)
print(aqa_response)

# Get the metadata from the first attributed passages for the source
chunk_resource_name = aqa_response.answer.grounding_attributions[0].source_id.semantic_retriever_chunk.chunk
get_chunk_response = retriever_service_client.get_chunk(name=chunk_resource_name)
print(get_chunk_response)

其他選項：使用內嵌段落的 AQA

或者，您也可以直接使用 AQA 端點，無須使用語意擷取器 API，方法是傳遞 inline_passages。

user_query = "What is AQA from Google?"
user_query_content = glm.Content(parts=[glm.Part(text=user_query)])
answer_style = "VERBOSE" # or ABSTRACTIVE, EXTRACTIVE
MODEL_NAME = "models/aqa"

# Create the grounding inline passages
grounding_passages = glm.GroundingPassages()
passage_a = glm.Content(parts=[glm.Part(text="Attributed Question and Answering (AQA) refers to answering questions grounded to a given corpus and providing citation")])
grounding_passages.passages.append(glm.GroundingPassage(content=passage_a, id="001"))
passage_b = glm.Content(parts=[glm.Part(text="An LLM is not designed to generate content grounded in a set of passages. Although instructing an LLM to answer questions only based on a set of passages reduces hallucination, hallucination still often occurs when LLMs generate responses unsupported by facts provided by passages")])
grounding_passages.passages.append(glm.GroundingPassage(content=passage_b, id="002"))
passage_c = glm.Content(parts=[glm.Part(text="Hallucination is one of the biggest problems in Large Language Models (LLM) development. Large Language Models (LLMs) could produce responses that are fictitious and incorrect, which significantly impacts the usefulness and trustworthiness of applications built with language models.")])
grounding_passages.passages.append(glm.GroundingPassage(content=passage_c, id="003"))

# Create the request
req = glm.GenerateAnswerRequest(model=MODEL_NAME,
                                contents=[user_query_content],
                                inline_passages=grounding_passages,
                                answer_style=answer_style)
aqa_response = generative_service_client.generate_answer(req)
print(aqa_response)

分享語料庫

您可以選擇使用 CreatePermissionRequest API 與他人共用語料庫。

限制：

分享角色分為 READER 和 EDITOR。
- READER 可查詢字典。
- WRITER 具有讀者的權限，此外還能編輯及分享語料庫。
只要將 EVERYONE 授予 user_type 讀取權限，即可公開字典。

# Replace your-email@gmail.com with the email added as a test user in the OAuth Quickstart
shared_user_email = "TODO-your-email@gmail.com" #  @param {type:"string"}
user_type = "USER"
role = "READER"

# Make the request
# corpus_resource_name is a variable set in the "Create a corpus" section.
request = glm.CreatePermissionRequest(
    parent=corpus_resource_name,
    permission=glm.Permission(grantee_type=user_type,
                              email_address=shared_user_email,
                              role=role))
create_permission_response = permission_service_client.create_permission(request)
print(create_permission_response)

刪除語料庫

使用 DeleteCorpusRequest 刪除使用者語料庫，以及所有相關聯的 Document 和 Chunk。

請注意，如果未指定 force=True 旗標，非空的字庫會擲回錯誤。如果您設定 force=True，系統也會刪除與此 Document 相關的所有 Chunk 和物件。

如果 force=False (預設值) 且 Document 包含任何 Chunk，系統會傳回 FAILED_PRECONDITION 錯誤。

# Set force to False if you don't want to delete non-empty corpora.
req = glm.DeleteCorpusRequest(name=corpus_resource_name, force=True)
delete_corpus_response = retriever_service_client.delete_corpus(req)
print("Successfully deleted corpus: " + corpus_resource_name)

摘要和延伸閱讀

本指南將介紹生成式語言 API 的語意擷取和歸因問題與回答 (AQA) API，並說明如何運用這個 API 擷取自訂文字資料中的語意資訊。請注意，此 API 也適用於 LlamaIndex 資料架構。詳情請參閱教學課程。

另請參閱 API 說明文件，進一步瞭解其他可用功能。

附錄：使用使用者憑證設定 OAuth

請按照 OAuth 快速入門中的步驟設定 OAuth 驗證。

設定 OAuth 同意畫面。
授權電腦版應用程式的憑證。如要在 Colab 中執行這個筆記本，請先將憑證檔案 (通常為 client_secret_*.json) 重新命名為 client_secret.json。接著，請使用左側欄的檔案圖示，然後點選上傳圖示來上傳檔案，如下方螢幕截圖所示。

# Replace TODO-your-project-name with the project used in the OAuth Quickstart
project_name = "TODO-your-project-name" #  @param {type:"string"}
# Replace TODO-your-email@gmail.com with the email added as a test user in the OAuth Quickstart
email = "TODO-your-email@gmail.com" #  @param {type:"string"}
# Rename the uploaded file to `client_secret.json` OR
# Change the variable `client_file_name` in the code below.
client_file_name = "client_secret.json"

# IMPORTANT: Follow the instructions from the output - you must copy the command
# to your terminal and copy the output after authentication back here.
!gcloud config set project $project_name
!gcloud config set account $email

# NOTE: The simplified project setup in this tutorial triggers a "Google hasn't verified this app." dialog.
# This is normal, click "Advanced" -> "Go to [app name] (unsafe)"
!gcloud auth application-default login --no-browser --client-id-file=$client_file_name --scopes="https://www.googleapis.com/auth/generative-language.retriever,https://www.googleapis.com/auth/cloud-platform"

初始化用戶端程式庫，然後從「建立語料庫」開始重新執行筆記本。

import google.ai.generativelanguage as glm

generative_service_client = glm.GenerativeServiceClient()
retriever_service_client = glm.RetrieverServiceClient()
permission_service_client = glm.PermissionServiceClient()