ลองดูที่เก็บ Gemma Cookbook เพื่อฟังตัวอย่างการสร้างและปรับแต่ง ดูข้อมูลเพิ่มเติม

หน้านี้ได้รับการแปลโดย Cloud Translation API

เริ่มต้นใช้งาน Gemma และ LangChain

บทแนะนำนี้จะแสดงวิธีเริ่มต้นใช้งาน Gemma และ LangChain ที่ทำงานใน Google Cloud หรือในสภาพแวดล้อม Colab Gemma คือกลุ่มผลิตภัณฑ์โมเดลแบบเปิดที่ทันสมัยและน้ำหนักเบา ซึ่งสร้างขึ้นจากงานวิจัยและเทคโนโลยีเดียวกับที่ใช้สร้างโมเดล Gemini LangChain เป็นเฟรมเวิร์กสำหรับการสร้างและทำให้แอปพลิเคชันแบบ Context-Aware ใช้งานได้ ซึ่งได้รับการสนับสนุนโดยโมเดลภาษา

เรียกใช้ Gemma ใน Google Cloud

แพ็กเกจ langchain-google-vertexai ให้การผสานรวม LangChain กับโมเดล Google Cloud

ติดตั้งการอ้างอิง

pip install --upgrade -q langchain langchain-google-vertexai

ตรวจสอบสิทธิ์

คุณต้องตรวจสอบสิทธิ์ เว้นแต่จะใช้ Colab Enterprise

from google.colab import auth
auth.authenticate_user()

ทำให้โมเดลใช้งานได้

Vertex AI เป็นแพลตฟอร์มสำหรับการฝึกและใช้งานโมเดลและแอปพลิเคชัน AI Model Garden คือคอลเล็กชันโมเดลที่ได้รับการดูแลจัดการซึ่งคุณสามารถสำรวจได้ในคอนโซล Google Cloud

หากต้องการทําให้ Gemma ใช้งานได้ ให้เปิดโมเดลใน Model Garden สําหรับ Vertex AI แล้วทําตามขั้นตอนต่อไปนี้

เลือกทำให้ใช้งานได้
ทำการเปลี่ยนแปลงตามต้องการในช่องของแบบฟอร์มการนําส่ง หรือปล่อยไว้ตามเดิมหากใช้ค่าเริ่มต้นได้ จดบันทึกฟิลด์ต่อไปนี้ไว้ ซึ่งคุณจะต้องใช้ในภายหลัง
- ชื่อปลายทาง (เช่น google_gemma-7b-it-mg-one-click-deploy)
- ภูมิภาค (เช่น us-west1)
เลือกติดตั้งใช้งานเพื่อติดตั้งใช้งานโมเดลใน Vertex AI การทำให้ใช้งานได้จะใช้เวลาสักครู่จึงจะเสร็จสมบูรณ์

เมื่อปลายทางพร้อมแล้ว ให้คัดลอกรหัสโปรเจ็กต์ รหัสปลายทาง และตำแหน่งของปลายทาง แล้วป้อนเป็นพารามิเตอร์

# @title Basic parameters
project: str = ""  # @param {type:"string"}
endpoint_id: str = ""  # @param {type:"string"}
location: str = "" # @param {type:"string"}

เรียกใช้โมเดล

from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden

llm = GemmaVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

output = llm.invoke("What is the meaning of life?")
print(output)

Prompt:
What is the meaning of life?
Output:
Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and

นอกจากนี้ คุณยังใช้ Gemma สำหรับแชทแบบหลายรอบได้ด้วย โดยทำดังนี้

from langchain_core.messages import (
    HumanMessage
)

llm = GemmaChatVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)

message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])

print(answer2)

content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4'
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'

คุณประมวลผลคำตอบหลังการประมวลผลเพื่อหลีกเลี่ยงการตอบซ้ำได้ดังนี้

answer1 = llm.invoke([message1], parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], parse_response=True)

print(answer2)

content='Output:\nSure, here is the answer:\n\n2 + 2 = 4'
content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'

เรียกใช้ Gemma จากไฟล์ที่ดาวน์โหลดจาก Kaggle

ส่วนนี้จะแสดงวิธีดาวน์โหลด Gemma จาก Kaggle แล้วเรียกใช้โมเดล

หากต้องการดำเนินการในส่วนนี้ให้เสร็จสมบูรณ์ คุณจะต้องทําตามวิธีการตั้งค่าที่หัวข้อการตั้งค่า Gemma ก่อน

จากนั้นไปที่ส่วนถัดไปเพื่อตั้งค่าตัวแปรสภาพแวดล้อมสําหรับสภาพแวดล้อม Colab

ตั้งค่าตัวแปรสภาพแวดล้อม

ตั้งค่าตัวแปรสภาพแวดล้อมสําหรับ KAGGLE_USERNAME และ KAGGLE_KEY

import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

ติดตั้งการอ้างอิง

# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
pip install -q -U keras-nlp
pip install -q -U keras>=3

เรียกใช้โมเดล

from langchain_google_vertexai import GemmaLocalKaggle

คุณสามารถระบุแบ็กเอนด์ Keras ได้ (ค่าเริ่มต้นคือ tensorflow แต่คุณเปลี่ยนเป็น jax หรือ torch ได้)

# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...

output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has

เรียกใช้โมเดลแชท

ดังตัวอย่างใน Google Cloud ด้านบน คุณสามารถใช้การติดตั้งใช้งาน Gemma ในเครื่องสําหรับแชทแบบหลายรอบได้ คุณอาจต้องรีสตาร์ทโน้ตบุ๊กและล้างหน่วยความจำ GPU เพื่อหลีกเลี่ยงข้อผิดพลาด OOM โดยทำดังนี้

from langchain_google_vertexai import GemmaChatLocalKaggle

# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...

from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."

message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"

คุณประมวลผลคำตอบในภายหลังได้หากต้องการหลีกเลี่ยงคำสั่งแบบหลายรอบ โดยทำดังนี้

answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)

content="I'm a model.\n Tampoco\nI'm a model."
content='I can help you with your modeling.\n Tampoco\nI can'

เรียกใช้ Gemma จากไฟล์ที่ดาวน์โหลดจาก Hugging Face

ตั้งค่า

เช่นเดียวกับ Kaggle ทาง Hugging Face กำหนดให้คุณต้องยอมรับข้อกำหนดและเงื่อนไขของ Gemma ก่อนเข้าถึงโมเดล หากต้องการเข้าถึง Gemma ผ่าน Hugging Face ให้ไปที่การ์ดโมเดล Gemma

นอกจากนี้ คุณจะต้องรับโทเค็นการเข้าถึงระดับผู้ใช้ที่มีสิทธิ์อ่าน ซึ่งป้อนได้ที่ด้านล่าง

# @title Basic parameters
hf_access_token: str = ""  # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}

เรียกใช้โมเดล

from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF

llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)

tokenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]
tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]
model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]
Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]
model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has been asked by philosophers, theologians, and scientists for centuries.

And it’s the question that

ดังตัวอย่างข้างต้น คุณสามารถใช้การติดตั้ง Gemma ในพื้นที่สําหรับแชทแบบหลายรอบได้ คุณอาจต้องรีสตาร์ทโน้ตบุ๊กและล้างหน่วยความจำ GPU เพื่อหลีกเลี่ยงข้อผิดพลาด OOM โดยทำดังนี้

เรียกใช้โมเดลแชท

llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"

message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"

คุณสามารถประมวลผลคำตอบหลังการประมวลผลได้ ดังตัวอย่างก่อนหน้านี้

answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)

content="I'm a model.\n<end_of_turn>\n"
content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'

ขั้นตอนถัดไป

ดูวิธีปรับแต่งโมเดล Gemma
ดูวิธีการปรับแต่งแบบกระจายและการอนุมานในโมเดล Gemma
ดูวิธีใช้โมเดล Gemma กับ Vertex AI