前往 ai.google.dev 查看 | 在 Google Colab 中运行 | 在 GitHub 上查看源代码 |
本教程介绍了如何开始使用在 Google Cloud 或 Colab 环境中运行的 Gemma 和 LangChain。Gemma 是一系列先进的轻量级开放模型,采用与 Gemini 模型相同的研究和技术构建而成。LangChain 是一个框架,用于构建和部署由语言模型提供支持的情境感知应用。
在 Google Cloud 中运行 Gemma
langchain-google-vertexai
软件包将 LangChain 与 Google Cloud 模型进行了集成。
安装依赖项
pip install --upgrade -q langchain langchain-google-vertexai
身份验证
除非您使用 Colab Enterprise,否则您需要进行身份验证。
from google.colab import auth
auth.authenticate_user()
部署模型
Vertex AI 是一个用于训练和部署 AI 模型及应用的平台。Model Garden 是一组精选模型,您可以在 Google Cloud 控制台中浏览。
如需部署 Gemma,请在 Model Garden for Vertex AI 中打开模型,并完成以下步骤:
- 选择部署。
- 根据需要更改部署表单字段,或保留
接受默认值即可请注意以下字段,稍后您会用到它们:
<ph type="x-smartling-placeholder">
- </ph>
- 端点名称(例如
google_gemma-7b-it-mg-one-click-deploy
) - 区域(例如
us-west1
)
- 端点名称(例如
- 选择部署以将模型部署到 Vertex AI。此 Deployment 将 需要几分钟才能完成。
端点准备就绪后,复制其项目 ID、端点 ID 和位置,然后输入参数作为参数。
# @title Basic parameters
project: str = "" # @param {type:"string"}
endpoint_id: str = "" # @param {type:"string"}
location: str = "" # @param {type:"string"}
运行模型
from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden
llm = GemmaVertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
location=location,
)
output = llm.invoke("What is the meaning of life?")
print(output)
Prompt: What is the meaning of life? Output: Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and
你还可以使用 Gemma 进行多轮聊天:
from langchain_core.messages import (
HumanMessage
)
llm = GemmaChatVertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
location=location,
)
message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)
message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])
print(answer2)
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4' content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'
您可以对回答进行后处理以避免重复:
answer1 = llm.invoke([message1], parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], parse_response=True)
print(answer2)
content='Output:\nSure, here is the answer:\n\n2 + 2 = 4' content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'
通过 Kaggle 下载版运行 Gemma
本部分介绍如何从 Kaggle 下载 Gemma,然后运行模型。
要完成此部分,你首先需要在 Gemma 设置中按照设置说明完成设置。
然后继续学习下一部分,您将为 Colab 环境设置环境变量。
设置环境变量
为 KAGGLE_USERNAME
和 KAGGLE_KEY
设置环境变量。
import os
from google.colab import userdata
# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
安装依赖项
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
pip install -q -U keras-nlp
pip install -q -U keras>=3
运行模型
from langchain_google_vertexai import GemmaLocalKaggle
您可以指定 Keras 后端(默认为 tensorflow
,但您可以将其更改为 jax
或 torch
)。
# @title Basic parameters
keras_backend: str = "jax" # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has
运行聊天模型
如上面的 Google Cloud 示例所示,您可以使用 Gemma 的本地部署进行多轮聊天。您可能需要重新启动笔记本并清理 GPU 内存,以避免 OOM 错误:
from langchain_google_vertexai import GemmaChatLocalKaggle
# @title Basic parameters
keras_backend: str = "jax" # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
from langchain_core.messages import (
HumanMessage
)
message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"
如果您想避免多轮语句,可以对响应进行后处理:
answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)
content="I'm a model.\n Tampoco\nI'm a model." content='I can help you with your modeling.\n Tampoco\nI can'
通过 Hugging Face 下载运行 Gemma
设置
与 Kaggle 一样,Hugging Face 要求您必须先接受 Gemma 的条款及条件,然后才能访问模型。要通过“Hugging Face”访问 Gemma,请前往 Gemma 模型卡片。
您还需要获取具有读取权限的用户访问令牌,请在下方输入该令牌。
# @title Basic parameters
hf_access_token: str = "" # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}
运行模型
from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF
llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)
tokenizer_config.json: 0%| | 0.00/1.11k [00:00<?, ?B/s] tokenizer.model: 0%| | 0.00/4.24M [00:00<?, ?B/s] tokenizer.json: 0%| | 0.00/17.5M [00:00<?, ?B/s] special_tokens_map.json: 0%| | 0.00/555 [00:00<?, ?B/s] config.json: 0%| | 0.00/627 [00:00<?, ?B/s] model.safetensors.index.json: 0%| | 0.00/13.5k [00:00<?, ?B/s] Downloading shards: 0%| | 0/2 [00:00<?, ?it/s] model-00001-of-00002.safetensors: 0%| | 0.00/4.95G [00:00<?, ?B/s] model-00002-of-00002.safetensors: 0%| | 0.00/67.1M [00:00<?, ?B/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] generation_config.json: 0%| | 0.00/137 [00:00<?, ?B/s]
output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has been asked by philosophers, theologians, and scientists for centuries. And it’s the question that
如上例所示,您可以使用 Gemma 的本地部署进行多轮聊天。您可能需要重新启动笔记本并清理 GPU 内存,以避免 OOM 错误:
运行聊天模型
llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
from langchain_core.messages import (
HumanMessage
)
message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"
与前面的示例一样,您可以对响应进行后处理:
answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)
content="I'm a model.\n<end_of_turn>\n" content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'
后续步骤
- 了解如何微调 Gemma 模型。
- 了解如何对 Gemma 模型执行分布式微调和推理。
- 了解如何将 Gemma 模型与 Vertex AI 搭配使用。