前往 Gemma Cookbook 存放區取得產生與調整範例！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

開始使用 Gemma 和 LangChain

本教學課程說明如何開始使用 Gemma 和 LangChain，並在 Google Cloud 或 Colab 環境中執行。Gemma 是一系列先進的輕量級開放式模型，採用與建立 Gemini 模型時相同的研究成果和技術。LangChain 是一種架構，可用於建構及部署以語言模型為後盾的情境感知應用程式。

在 Google Cloud 中執行 Gemma

langchain-google-vertexai 套件可讓 LangChain 與 Google Cloud 模型整合。

安裝依附元件

pip install --upgrade -q langchain langchain-google-vertexai

驗證

除非您使用的是 Colab Enterprise，否則必須進行驗證。

from google.colab import auth
auth.authenticate_user()

部署模型

Vertex AI 是一個訓練及部署 AI 模型和應用程式的平台。Model Garden 是精選的模型集合，您可以在 Google Cloud 控制台中探索。

如要部署 Gemma，請在 Vertex AI 的 Model Garden 中開啟模型，然後完成下列步驟：

選取 [Deploy] (部署)。
視需要變更部署表單欄位，如果您滿意預設值，則可保留原狀。請記下下列欄位，稍後會用到：
- 端點名稱 (例如 google_gemma-7b-it-mg-one-click-deploy)
- 區域 (例如 us-west1)
選取「Deploy」，將模型部署至 Vertex AI。部署作業需要幾分鐘才會完成。

端點準備就緒後，請複製其專案 ID、端點 ID 和位置，並將這些資訊做為參數輸入。

# @title Basic parameters
project: str = ""  # @param {type:"string"}
endpoint_id: str = ""  # @param {type:"string"}
location: str = "" # @param {type:"string"}

執行模型

from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden

llm = GemmaVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

output = llm.invoke("What is the meaning of life?")
print(output)

Prompt:
What is the meaning of life?
Output:
Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and

您也可以使用 Gemma 進行多回合聊天：

from langchain_core.messages import (
    HumanMessage
)

llm = GemmaChatVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)

message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])

print(answer2)

content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4'
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'

您可以對回覆進行後續處理，避免重複：

answer1 = llm.invoke([message1], parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], parse_response=True)

print(answer2)

content='Output:\nSure, here is the answer:\n\n2 + 2 = 4'
content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'

從 Kaggle 下載的檔案執行 Gemma

本節將說明如何從 Kaggle 下載 Gemma，然後執行模型。

如要完成本節，您必須先完成 Gemma 設定的設定說明。

然後繼續閱讀下一節，瞭解如何設定 Colab 環境的環境變數。

設定環境變數

設定 KAGGLE_USERNAME 和 KAGGLE_KEY 的環境變數。

import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

安裝依附元件

# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
pip install -q -U keras-nlp
pip install -q -U keras>=3

執行模型

from langchain_google_vertexai import GemmaLocalKaggle

您可以指定 Keras 後端 (預設為 tensorflow，但您可以將其變更為 jax 或 torch)。

# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...

output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has

執行即時通訊模型

如同上述 Google Cloud 範例，您可以使用 Gemma 的本機部署進行多回合聊天。您可能需要重新啟動筆記本並清除 GPU 記憶體，以免發生 OOM 錯誤：

from langchain_google_vertexai import GemmaChatLocalKaggle

# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...

from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."

message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"

如要避免多回合陳述，您可以對回應進行後置處理：

answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)

content="I'm a model.\n Tampoco\nI'm a model."
content='I can help you with your modeling.\n Tampoco\nI can'

透過 Hugging Face 下載執行 Gemma

設定

和 Kaggle 一樣，Hugging Face 要求您先接受 Gemma 的條款及細則，才能存取模型。如要透過 Hugging Face 存取 Gemma，請前往 Gemma 模型資訊卡。

您還需要取得具備讀取權限的使用者存取權杖，請在下方輸入這項資訊。

# @title Basic parameters
hf_access_token: str = ""  # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}

執行模型

from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF

llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)

tokenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]
tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]
model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]
Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]
model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has been asked by philosophers, theologians, and scientists for centuries.

And it’s the question that

如上述範例所示，您可以使用 Gemma 的本機部署作業進行多回合聊天。您可能需要重新啟動筆記本並清除 GPU 記憶體，以免發生 OOM 錯誤：

執行即時通訊模型

llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"

message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"

如同先前的範例，您可以對回應進行後置處理：

answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)

content="I'm a model.\n<end_of_turn>\n"
content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'

開始使用 Gemma 和 LangChain

在 Google Cloud 中執行 Gemma

安裝依附元件

驗證

部署模型

執行模型

從 Kaggle 下載的檔案執行 Gemma

設定環境變數

安裝依附元件

執行模型

執行即時通訊模型

透過 Hugging Face 下載執行 Gemma

設定

執行模型

執行即時通訊模型

後續步驟