Gemma 3n เปิดตัวพร้อมอินพุตเสียงและเพิ่มประสิทธิภาพเพื่อใช้ในอุปกรณ์ทั่วไป ดูข้อมูลเพิ่มเติม

หน้านี้ได้รับการแปลโดย Cloud Translation API

สร้างการฝังด้วย Sentence Transformers

ดูที่ ai.google.dev

เรียกใช้ใน Google Colab

เรียกใช้ใน Kaggle

เปิดใน Vertex AI

ดูแหล่งข้อมูลใน GitHub

EmbeddingGemma เป็นโมเดลการฝังแบบโอเพนซอร์สที่มีน้ำหนักเบา ซึ่งออกแบบมาเพื่อการดึงข้อมูลที่รวดเร็วและมีคุณภาพสูงในอุปกรณ์ที่ใช้ในชีวิตประจำวัน เช่น โทรศัพท์มือถือ โมเดลนี้มีพารามิเตอร์เพียง 308 ล้านรายการ จึงมีประสิทธิภาพเพียงพอที่จะเรียกใช้เทคนิค AI ขั้นสูง เช่น การสร้างแบบดึงข้อมูลเสริม (RAG) โดยตรงในเครื่องของคุณโดยไม่ต้องเชื่อมต่ออินเทอร์เน็ต

ตั้งค่า

ก่อนเริ่มบทแนะนำนี้ ให้ทำตามขั้นตอนต่อไปนี้

รับสิทธิ์เข้าถึง Gemma โดยเข้าสู่ระบบ Hugging Face แล้วเลือกรับทราบสัญญาอนุญาตสำหรับโมเดล Gemma
สร้างโทเค็นเพื่อการเข้าถึงของ Hugging Face แล้วใช้เพื่อเข้าสู่ระบบจาก Colab

สมุดบันทึกนี้จะทำงานบน CPU หรือ GPU

ติดตั้งแพ็กเกจ Python

ติดตั้งไลบรารีที่จำเป็นสำหรับการเรียกใช้โมเดล EmbeddingGemma และการสร้างการฝัง Sentence Transformers เป็นเฟรมเวิร์ก Python สำหรับการฝังข้อความและรูปภาพ ดูข้อมูลเพิ่มเติมได้ในเอกสารประกอบของ Sentence Transformers

pip install -U sentence-transformers git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview

หลังจากยอมรับใบอนุญาตแล้ว คุณจะต้องมีโทเค็น Hugging Face ที่ถูกต้องเพื่อเข้าถึงโมเดล

# Login into Hugging Face Hub
from huggingface_hub import login
login()

โหลดโมเดล

ใช้ไลบรารี sentence-transformers เพื่อสร้างอินสแตนซ์ของคลาสโมเดลด้วย EmbeddingGemma

import torch
from sentence_transformers import SentenceTransformer

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "google/embeddinggemma-300M"
model = SentenceTransformer(model_id).to(device=device)

print(f"Device: {model.device}")
print(model)
print("Total number of parameters in the model:", sum([p.numel() for _, p in model.named_parameters()]))

Device: cuda:0
SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)
Total number of parameters in the model: 307581696

การสร้างการฝัง

การฝังคือการแสดงข้อความเป็นตัวเลข เช่น คำหรือประโยค ซึ่งจะจับความหมายเชิงความหมายของข้อความนั้น โดยพื้นฐานแล้ว Word Embedding คือรายการตัวเลข (เวกเตอร์) ที่ช่วยให้คอมพิวเตอร์เข้าใจความสัมพันธ์และบริบทของคำ

มาดูกันว่า EmbeddingGemma จะประมวลผลคำ 3 คำที่แตกต่างกันอย่างไร["apple", "banana", "car"]

EmbeddingGemma ได้รับการฝึกด้วยข้อความจำนวนมหาศาลและได้เรียนรู้ความสัมพันธ์ระหว่างคำและแนวคิด

words = ["apple", "banana", "car"]

# Calculate embeddings by calling model.encode()
embeddings = model.encode(words)

print(embeddings)
for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1} (shape): {embedding.shape}")

[[-0.18476306  0.00167681  0.03773484 ... -0.07996225 -0.02348064
   0.00976741]
 [-0.21189538 -0.02657359  0.02513712 ... -0.08042689 -0.01999852
   0.00512146]
 [-0.18924113 -0.02551468  0.04486253 ... -0.06377774 -0.03699806
   0.03973572]]
Embedding 1: (768,)
Embedding 2: (768,)
Embedding 3: (768,)

โมเดลจะแสดงเวกเตอร์เชิงตัวเลขสำหรับแต่ละประโยค เวกเตอร์จริงมีความยาวมาก (768) แต่เพื่อความเรียบง่าย เราจึงนำเสนอเวกเตอร์เหล่านั้นด้วยมิติข้อมูลเพียงไม่กี่มิติ

คีย์ไม่ใช่ตัวเลขแต่ละตัว แต่เป็นระยะห่างระหว่างเวกเตอร์ หากเราจะพล็อตเวกเตอร์เหล่านี้ในพื้นที่หลายมิติ เวกเตอร์สำหรับ apple และ banana จะอยู่ใกล้กันมาก และเวกเตอร์สำหรับ car จะอยู่ห่างจากอีก 2 เวกเตอร์

การพิจารณาความคล้ายคลึง

ในส่วนนี้ เราใช้การฝังเพื่อพิจารณาว่าประโยคต่างๆ มีความคล้ายคลึงกันในเชิงความหมายมากน้อยเพียงใด ในที่นี้ เราจะแสดงตัวอย่างที่มีคะแนนความคล้ายกันสูง ปานกลาง และต่ำ

ความคล้ายคลึงสูง:
- ประโยค ก: "เชฟเตรียมอาหารมื้ออร่อยสำหรับแขก"
- ประโยค B: "เชฟทำอาหารเย็นแสนอร่อยให้ผู้มาเยือน"
- เหตุผล: ทั้ง 2 ประโยคอธิบายเหตุการณ์เดียวกันโดยใช้คำและโครงสร้างทางไวยากรณ์ที่แตกต่างกัน (ประโยคกรรมเทียบกับประโยคความ) โดยมีความหมายหลักเหมือนกัน
ความคล้ายคลึงปานกลาง
- ประโยค ก: "เธอเป็นผู้เชี่ยวชาญด้านแมชชีนเลิร์นนิง"
- ประโยค B: "เขาสนใจปัญญาประดิษฐ์เป็นอย่างมาก"
- เหตุผล: ประโยคมีความเกี่ยวข้องเนื่องจากแมชชีนเลิร์นนิงเป็นสาขาย่อยของปัญญาประดิษฐ์ อย่างไรก็ตาม ทั้ง 2 กลุ่มพูดถึงบุคคลที่แตกต่างกันโดยมีระดับการมีส่วนร่วมที่แตกต่างกัน (ผู้เชี่ยวชาญเทียบกับผู้สนใจ)
ความคล้ายคลึงต่ำ:
- ประโยค ก: "วันนี้อากาศที่โตเกียวมีแดด"
- ประโยค B: "ฉันต้องซื้อของชำสำหรับสัปดาห์นี้"
- เหตุผล: ประโยคทั้ง 2 ประโยคเป็นหัวข้อที่ไม่เกี่ยวข้องโดยสิ้นเชิงและไม่มีความหมายที่ทับซ้อนกัน

# The sentences to encode
sentence_high = [
    "The chef prepared a delicious meal for the guests.",
    "A tasty dinner was cooked by the chef for the visitors."
]
sentence_medium = [
    "She is an expert in machine learning.",
    "He has a deep interest in artificial intelligence."
]
sentence_low = [
    "The weather in Tokyo is sunny today.",
    "I need to buy groceries for the week."
]

for sentence in [sentence_high, sentence_medium, sentence_low]:
  print("🙋‍♂️")
  print(sentence)
  embeddings = model.encode(sentence)
  similarities = model.similarity(embeddings[0], embeddings[1])
  print("`-> 🤖 score: ", similarities.numpy()[0][0])

🙋‍♂️
['The chef prepared a delicious meal for the guests.', 'A tasty dinner was cooked by the chef for the visitors.']
`-> 🤖 score:  0.8002148
🙋‍♂️
['She is an expert in machine learning.', 'He has a deep interest in artificial intelligence.']
`-> 🤖 score:  0.45417833
🙋‍♂️
['The weather in Tokyo is sunny today.', 'I need to buy groceries for the week.']
`-> 🤖 score:  0.22262995

การใช้พรอมต์กับ EmbeddingGemma

หากต้องการสร้างการฝังที่ดีที่สุดด้วย EmbeddingGemma คุณควรเพิ่ม "พรอมต์คำสั่ง" หรือ "งาน" ไว้ที่จุดเริ่มต้นของข้อความอินพุต พรอมต์เหล่านี้จะเพิ่มประสิทธิภาพการฝังสำหรับงานที่เฉพาะเจาะจง เช่น การดึงข้อมูลเอกสารหรือการตอบคำถาม และช่วยให้โมเดลแยกความแตกต่างระหว่างอินพุตประเภทต่างๆ เช่น คำค้นหากับเอกสาร

วิธีใช้พรอมต์

คุณใช้พรอมต์ระหว่างการอนุมานได้ 3 วิธี

ใช้promptอาร์กิวเมนต์
ส่งสตริงพรอมต์แบบเต็มไปยังเมธอด encode โดยตรง ซึ่งจะช่วยให้คุณควบคุมได้อย่างแม่นยำ
```
embeddings = model.encode(
    sentence,
    prompt="task: sentence similarity | query: "
)
```
ใช้prompt_nameอาร์กิวเมนต์
เลือกพรอมต์ที่กำหนดไว้ล่วงหน้าตามชื่อ ระบบจะโหลดพรอมต์เหล่านี้จากการกำหนดค่าของโมเดลหรือในระหว่างการเริ่มต้น
```
embeddings = model.encode(sentence, prompt_name="STS")
```
การใช้พรอมต์เริ่มต้น
หากคุณไม่ได้ระบุ prompt หรือ prompt_name ระบบจะใช้พรอมต์ที่ตั้งค่าเป็น default_prompt_name โดยอัตโนมัติ หากไม่ได้ตั้งค่าเริ่มต้นไว้ ระบบจะไม่ใช้พรอมต์
```
embeddings = model.encode(sentence)
```

print("Available tasks:")
for name, prefix in model.prompts.items():
  print(f" {name}: \"{prefix}\"")
print("-"*80)

for sentence in [sentence_high, sentence_medium, sentence_low]:
  print("🙋‍♂️")
  print(sentence)
  embeddings = model.encode(sentence, prompt_name="STS")
  similarities = model.similarity(embeddings[0], embeddings[1])
  print("`-> 🤖 score: ", similarities.numpy()[0][0])

Available tasks:
 query: "task: search result | query: "
 document: "title: none | text: "
 BitextMining: "task: search result | query: "
 Clustering: "task: clustering | query: "
 Classification: "task: classification | query: "
 InstructionRetrieval: "task: code retrieval | query: "
 MultilabelClassification: "task: classification | query: "
 PairClassification: "task: sentence similarity | query: "
 Reranking: "task: search result | query: "
 Retrieval: "task: search result | query: "
 Retrieval-query: "task: search result | query: "
 Retrieval-document: "title: none | text: "
 STS: "task: sentence similarity | query: "
 Summarization: "task: summarization | query: "
--------------------------------------------------------------------------------
🙋‍♂️
['The chef prepared a delicious meal for the guests.', 'A tasty dinner was cooked by the chef for the visitors.']
`-> 🤖 score:  0.9363755
🙋‍♂️
['She is an expert in machine learning.', 'He has a deep interest in artificial intelligence.']
`-> 🤖 score:  0.6425841
🙋‍♂️
['The weather in Tokyo is sunny today.', 'I need to buy groceries for the week.']
`-> 🤖 score:  0.38587403

กรณีการใช้งาน: การสร้างโดยใช้การดึงข้อมูล (RAG)

สําหรับระบบ RAG ให้ใช้ค่า prompt_name ต่อไปนี้เพื่อสร้างการฝังเฉพาะสําหรับคําค้นหาและเอกสาร

สำหรับคำถาม: ใช้ prompt_name="Retrieval-query"

query_embedding = model.encode(
    "How do I use prompts with this model?",
    prompt_name="Retrieval-query"
)

สำหรับเอกสาร: ใช้ prompt_name="Retrieval-document" หากต้องการปรับปรุงการฝังเอกสารเพิ่มเติม คุณยังระบุชื่อได้โดยใช้promptอาร์กิวเมนต์โดยตรง
- มีชื่อ:
```
doc_embedding = model.encode(
    "The document text...",
    prompt="title: Using Prompts in RAG | text: "
)
```
- ไม่มีชื่อ:
```
doc_embedding = model.encode(
    "The document text...",
    prompt="title: none | text: "
)
```

อ่านเพิ่มเติม

ดูรายละเอียดเกี่ยวกับพรอมต์ EmbeddingGemma ทั้งหมดที่มีได้ที่การ์ดโมเดล
ดูข้อมูลทั่วไปเกี่ยวกับเทมเพลตพรอมต์ได้ในเอกสารประกอบของ Sentence Transformer
หากต้องการดูการสาธิต RAG โปรดดูตัวอย่าง RAG แบบง่ายใน Gemma Cookbook

การจัดประเภท

การจัดประเภทคือการกำหนดข้อความชิ้นหนึ่งให้กับหมวดหมู่หรือป้ายกำกับที่กำหนดไว้ล่วงหน้าอย่างน้อย 1 รายการ ซึ่งเป็นงานพื้นฐานที่สุดอย่างหนึ่งในการประมวลผลภาษาธรรมชาติ (NLP)

การใช้งานจริงของการแยกประเภทข้อความคือการกำหนดเส้นทางการแจ้งปัญหาของฝ่ายสนับสนุนลูกค้า กระบวนการนี้จะนำคำค้นหาของลูกค้าไปยังแผนกที่ถูกต้องโดยอัตโนมัติ ซึ่งช่วยประหยัดเวลาและลดการทำงานด้วยตนเอง

labels = ["Billing Issue", "Technical Support", "Sales Inquiry"]

sentence = [
  "Excuse me, the app freezes on the login screen. It won't work even when I try to reset my password.",
  "I would like to inquire about your enterprise plan pricing and features for a team of 50 people.",
]

# Calculate embeddings by calling model.encode()
label_embeddings = model.encode(labels, prompt_name="Classification")
embeddings = model.encode(sentence, prompt_name="Classification")

# Calculate the embedding similarities
similarities = model.similarity(embeddings, label_embeddings)
print(similarities)

idx = similarities.argmax(1)
print(idx)

for example in sentence:
  print("🙋‍♂️", example, "-> 🤖", labels[idx[sentence.index(example)]])

tensor([[0.4673, 0.5145, 0.3604],
        [0.4191, 0.5010, 0.5966]])
tensor([1, 2])
🙋‍♂️ Excuse me, the app freezes on the login screen. It won't work even when I try to reset my password. -> 🤖 Technical Support
🙋‍♂️ I would like to inquire about your enterprise plan pricing and features for a team of 50 people. -> 🤖 Sales Inquiry

Matryoshka Representation Learning (MRL)

การฝัง Gemma ใช้ประโยชน์จาก MRL เพื่อให้การฝังหลายขนาดจากโมเดลเดียว ซึ่งเป็นวิธีการฝึกที่ชาญฉลาดซึ่งสร้างการฝังคุณภาพสูงรายการเดียวที่ข้อมูลสำคัญที่สุดจะรวมอยู่ที่จุดเริ่มต้นของเวกเตอร์

ซึ่งหมายความว่าคุณจะได้รับ Embedding ที่เล็กลงแต่ยังคงมีประโยชน์มาก เพียงแค่ใช้Nมิติแรกของ Embedding แบบเต็ม การใช้การฝังที่สั้นลงและมีขนาดเล็กลงจะช่วยประหยัดค่าจัดเก็บและประมวลผลได้เร็วขึ้นอย่างมาก แต่ประสิทธิภาพนี้อาจทำให้การฝังมีคุณภาพต่ำลง MRL ช่วยให้คุณเลือกความสมดุลที่เหมาะสมที่สุดระหว่างความเร็วและความแม่นยำนี้สำหรับความต้องการเฉพาะของแอปพลิเคชันได้

มาใช้คำ 3 คำ["apple", "banana", "car"]และสร้างการฝังแบบง่ายเพื่อดูว่า MRL ทำงานอย่างไร

def check_word_similarities():
  # Calculate the embedding similarities
  print("similarity function: ", model.similarity_fn_name)
  similarities = model.similarity(embeddings[0], embeddings[1:])
  print(similarities)

  for idx, word in enumerate(words[1:]):
    print("🙋‍♂️ apple vs.", word, "-> 🤖 score: ", similarities.numpy()[0][idx])

# Calculate embeddings by calling model.encode()
embeddings = model.encode(words, prompt_name="STS")

check_word_similarities()

similarity function:  cosine
tensor([[0.7510, 0.6685]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.75102395
🙋‍♂️ apple vs. car -> 🤖 score:  0.6684626

ตอนนี้คุณไม่จำเป็นต้องมีโมเดลใหม่เพื่อการสมัครที่เร็วขึ้น เพียงตัดการฝังแบบเต็มให้เหลือมิติข้อมูล 512 แรก นอกจากนี้ เราขอแนะนำให้ตั้งค่า normalize_embeddings=True ซึ่งจะปรับเวกเตอร์ให้มีความยาวหน่วยเป็น 1 เพื่อให้ได้ผลลัพธ์ที่ดีที่สุด

embeddings = model.encode(words, truncate_dim=512, normalize_embeddings=True)

for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1}: {embedding.shape}")

print("-"*80)
check_word_similarities()

Embedding 1: (512,)
Embedding 2: (512,)
Embedding 3: (512,)
--------------------------------------------------------------------------------
similarity function:  cosine
tensor([[0.7674, 0.7041]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.767427
🙋‍♂️ apple vs. car -> 🤖 score:  0.7040509

ในสภาพแวดล้อมที่มีข้อจำกัดอย่างมาก คุณสามารถลดขนาดการฝังให้เหลือเพียง256 มิติได้ นอกจากนี้ คุณยังใช้ผลคูณจุดที่มีประสิทธิภาพมากกว่าในการคำนวณความคล้ายคลึงแทนความคล้ายคลึงโคไซน์มาตรฐานได้ด้วย

model = SentenceTransformer(model_id, truncate_dim=256, similarity_fn_name="dot").to(device=device)
embeddings = model.encode(words, prompt_name="STS", normalize_embeddings=True)

for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1}: {embedding.shape}")

print("-"*80)
check_word_similarities()

Embedding 1: (256,)
Embedding 2: (256,)
Embedding 3: (256,)
--------------------------------------------------------------------------------
similarity function:  dot
tensor([[0.7855, 0.7382]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.7854644
🙋‍♂️ apple vs. car -> 🤖 score:  0.7382126

สรุปและขั้นตอนถัดไป

ตอนนี้คุณพร้อมที่จะสร้างการฝังข้อความคุณภาพสูงโดยใช้ EmbeddingGemma และไลบรารี Sentence Transformers แล้ว นำทักษะเหล่านี้ไปใช้สร้างฟีเจอร์ที่มีประสิทธิภาพ เช่น ความคล้ายกันเชิงความหมาย การจัดประเภทข้อความ และระบบการสร้างที่เพิ่มการดึงข้อมูล (RAG) และสำรวจต่อไปว่าโมเดล Gemma ทำอะไรได้บ้าง

ดูเอกสารต่อไปนี้

ปรับแต่ง EmbeddingGemma
ตัวอย่าง RAG แบบง่ายในตำราอาหารของ Gemma