開始使用 Gemma 和 LangChain

前往 ai.google.dev 查看 在 Google Colab 中執行 在 GitHub 上查看原始碼

這個教學課程說明如何開始使用在 Google Cloud 或 Colab 環境中運作的 GemmaLangChain。Gemma 是一系列最先進的開放式模型,採用與建立 Gemini 模型相同的研究和技術打造而成。LangChain 是一種架構,可用於建構及部署以語言模型為基礎的情境感知應用程式。

在 Google Cloud 中執行 Gemma

langchain-google-vertexai 套件提供 LangChain 與 Google Cloud 模型整合服務。

安裝依附元件

pip install --upgrade -q langchain langchain-google-vertexai

驗證

除非你使用的是 Colab Enterprise,否則必須通過驗證。

from google.colab import auth
auth.authenticate_user()

部署模型

Vertex AI 是訓練及部署 AI 模型和應用程式的平台。Model Garden 是一組精選模型,您可以在 Google Cloud 控制台中探索。

如要部署 Gemma,請在 Vertex AI 適用的 Model Garden 中開啟模型,並完成下列步驟:

  1. 選取 [Deploy] (部署)。
  2. 對部署表單欄位進行所需變更,或者保留預設值 (如果預設值沒問題)。記下下列欄位,稍後會用到:
    • 端點名稱 (例如 google_gemma-7b-it-mg-one-click-deploy)
    • 區域 (例如 us-west1)
  3. 選取「部署」,將模型部署至 Vertex AI。部署需要幾分鐘才能完成。

端點準備就緒後,請複製其專案 ID、端點 ID 和位置,然後輸入這些內容做為參數。

# @title Basic parameters
project: str = ""  # @param {type:"string"}
endpoint_id: str = ""  # @param {type:"string"}
location: str = "" # @param {type:"string"}

執行模型

from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden

llm = GemmaVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

output = llm.invoke("What is the meaning of life?")
print(output)
Prompt:
What is the meaning of life?
Output:
Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and

您也可以使用 Gemma 進行多輪即時通訊:

from langchain_core.messages import (
    HumanMessage
)

llm = GemmaChatVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)

message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])

print(answer2)
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4'
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'

您可以後置處理回覆,以免重複:

answer1 = llm.invoke([message1], parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], parse_response=True)

print(answer2)
content='Output:\nSure, here is the answer:\n\n2 + 2 = 4'
content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'

透過 Kaggle 下載執行 Gemma

本節說明如何從 Kaggle 下載 Gemma,並執行模型。

如要完成這個部分,您必須先在 Gemma 設定中完成設定。

然後進行下一節,您將為 Colab 環境設定環境變數。

設定環境變數

設定 KAGGLE_USERNAMEKAGGLE_KEY 的環境變數。

import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

安裝依附元件

# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
pip install -q -U keras-nlp
pip install -q -U keras>=3

執行模型

from langchain_google_vertexai import GemmaLocalKaggle

您可以指定 Keras 後端 (預設為 tensorflow,但可變更為 jaxtorch)。

# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)
What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has

執行聊天模型

如上方 Google Cloud 範例所示,您可以使用 Gemma 的本機部署進行多輪即時通訊。為避免 OOM 錯誤,您可能需要重新啟動筆記本並清除 GPU 記憶體:

from langchain_google_vertexai import GemmaChatLocalKaggle
# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)

print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"

如果您想避免多輪陳述式,可以後續處理回應:

answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)
content="I'm a model.\n Tampoco\nI'm a model."
content='I can help you with your modeling.\n Tampoco\nI can'

透過 Hugging Face 下載項目執行 Gemma

設定

和 Kaggle 一樣,Hugging Face 也規定,你必須先接受 Gemma 條款及細則,才能存取模型。如要透過 Hugging Face 存取 Gemma,請前往 Gemma 模型資訊卡

您也可以在下方輸入具備讀取權限的使用者存取權杖

# @title Basic parameters
hf_access_token: str = ""  # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}

執行模型

from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF
llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)
tokenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]
tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]
model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]
Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]
model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]
output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)
What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has been asked by philosophers, theologians, and scientists for centuries.

And it’s the question that

如以上範例所示,您可以在本機部署 Gemma 進行多輪即時通訊。為避免 OOM 錯誤,您可能需要重新啟動筆記本並清除 GPU 記憶體:

執行聊天模型

llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)

print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"

如之前的範例一樣,您可以後置回應:

answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)
content="I'm a model.\n<end_of_turn>\n"
content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'

後續步驟