Lihat di ai.google.dev | Menjalankan di Google Colab | Lihat sumber di GitHub |
Tutorial ini menunjukkan cara mulai menggunakan Gemma dan LangChain, yang berjalan di Google Cloud atau di lingkungan Colab Anda. Gemma adalah kelompok model terbuka yang ringan dan canggih, dibangun dari penelitian dan teknologi yang sama dengan yang digunakan untuk membuat model Gemini. LangChain adalah framework untuk membangun dan men-deploy aplikasi kontekstual yang didukung oleh model bahasa.
Jalankan Gemma di Google Cloud
Paket langchain-google-vertexai
menyediakan integrasi LangChain dengan model Google Cloud.
Menginstal dependensi
pip install --upgrade -q langchain langchain-google-vertexai
Autentikasikan
Anda perlu melakukan autentikasi, kecuali jika menggunakan Colab Enterprise.
from google.colab import auth
auth.authenticate_user()
Men-deploy model
Vertex AI adalah platform untuk melatih dan men-deploy model dan aplikasi AI. Model Garden adalah koleksi model pilihan yang dapat Anda jelajahi di konsol Google Cloud.
Untuk men-deploy Gemma, buka model di Model Garden untuk Vertex AI dan selesaikan langkah-langkah berikut:
- Pilih Deploy.
- Buat perubahan yang diinginkan pada kolom formulir deployment, atau biarkan apa adanya, jika Anda menerima setelan default-nya. Catat kolom berikut, yang akan Anda perlukan nanti:
- Nama endpoint (misalnya
google_gemma-7b-it-mg-one-click-deploy
) - Region (misalnya
us-west1
)
- Nama endpoint (misalnya
- Pilih Deploy untuk men-deploy model ke Vertex AI. Proses deployment ini akan memerlukan waktu beberapa menit.
Jika endpoint sudah siap, salin project ID, ID endpoint, dan lokasinya, lalu masukkan sebagai parameter.
# @title Basic parameters
project: str = "" # @param {type:"string"}
endpoint_id: str = "" # @param {type:"string"}
location: str = "" # @param {type:"string"}
Menjalankan model
from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden
llm = GemmaVertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
location=location,
)
output = llm.invoke("What is the meaning of life?")
print(output)
Prompt: What is the meaning of life? Output: Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and
Anda juga dapat menggunakan Gemma untuk chat dengan banyak giliran:
from langchain_core.messages import (
HumanMessage
)
llm = GemmaChatVertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
location=location,
)
message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)
message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])
print(answer2)
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4' content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'
Anda dapat pasca-proses respons untuk menghindari pengulangan:
answer1 = llm.invoke([message1], parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], parse_response=True)
print(answer2)
content='Output:\nSure, here is the answer:\n\n2 + 2 = 4' content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'
Jalankan Gemma dari unduhan Kaggle
Bagian ini menunjukkan kepada Anda cara mengunduh Gemma dari Kaggle dan kemudian menjalankan modelnya.
Untuk menyelesaikan bagian ini, Anda harus menyelesaikan petunjuk penyiapan terlebih dahulu di Penyiapan Gemma.
Kemudian, lanjutkan ke bagian berikutnya, tempat Anda akan menetapkan variabel lingkungan untuk lingkungan Colab.
Menetapkan variabel lingkungan
Tetapkan variabel lingkungan untuk KAGGLE_USERNAME
dan KAGGLE_KEY
.
import os
from google.colab import userdata
# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
Menginstal dependensi
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
pip install -q -U keras-nlp
pip install -q -U keras>=3
Menjalankan model
from langchain_google_vertexai import GemmaLocalKaggle
Anda dapat menentukan backend Keras (secara default adalah tensorflow
, tetapi Anda dapat mengubahnya menjadi jax
atau torch
).
# @title Basic parameters
keras_backend: str = "jax" # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has
Menjalankan model chat
Seperti dalam contoh Google Cloud di atas, Anda dapat menggunakan deployment lokal Gemma untuk chat multi-giliran. Anda mungkin perlu memulai ulang notebook dan membersihkan memori GPU untuk menghindari error OOM:
from langchain_google_vertexai import GemmaChatLocalKaggle
# @title Basic parameters
keras_backend: str = "jax" # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}
llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook... Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
from langchain_core.messages import (
HumanMessage
)
message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"
Anda dapat melakukan pascapemrosesan terhadap respons jika ingin menghindari pernyataan multi-giliran:
answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)
content="I'm a model.\n Tampoco\nI'm a model." content='I can help you with your modeling.\n Tampoco\nI can'
Download Gemma dari Wajah Memeluk
Penyiapan
Seperti Kaggle, Hugging Face mengharuskan Anda untuk menyetujui syarat dan ketentuan Gemma sebelum mengakses model. Untuk mendapatkan akses ke Gemma melalui Wajah Memeluk, buka kartu model Gemma.
Anda juga harus mendapatkan token akses pengguna dengan izin baca, yang dapat Anda masukkan di bawah.
# @title Basic parameters
hf_access_token: str = "" # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}
Menjalankan model
from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF
llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)
tokenizer_config.json: 0%| | 0.00/1.11k [00:00<?, ?B/s] tokenizer.model: 0%| | 0.00/4.24M [00:00<?, ?B/s] tokenizer.json: 0%| | 0.00/17.5M [00:00<?, ?B/s] special_tokens_map.json: 0%| | 0.00/555 [00:00<?, ?B/s] config.json: 0%| | 0.00/627 [00:00<?, ?B/s] model.safetensors.index.json: 0%| | 0.00/13.5k [00:00<?, ?B/s] Downloading shards: 0%| | 0/2 [00:00<?, ?it/s] model-00001-of-00002.safetensors: 0%| | 0.00/4.95G [00:00<?, ?B/s] model-00002-of-00002.safetensors: 0%| | 0.00/67.1M [00:00<?, ?B/s] Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] generation_config.json: 0%| | 0.00/137 [00:00<?, ?B/s]
output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has been asked by philosophers, theologians, and scientists for centuries. And it’s the question that
Seperti pada contoh di atas, Anda dapat menggunakan deployment lokal Gemma untuk chat dengan banyak giliran. Anda mungkin perlu memulai ulang notebook dan membersihkan memori GPU untuk menghindari error OOM:
Menjalankan model chat
llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
from langchain_core.messages import (
HumanMessage
)
message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"
Seperti pada contoh sebelumnya, Anda dapat melakukan pascapemrosesan respons:
answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)
answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)
content="I'm a model.\n<end_of_turn>\n" content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'
Langkah selanjutnya
- Pelajari cara menyesuaikan model Gemma.
- Pelajari cara melakukan penyelarasan dan inferensi terdistribusi pada model Gemma.
- Pelajari cara menggunakan model Gemma dengan Vertex AI.