Gemma 3n 現已發布，可輸入音訊，並針對日常使用的裝置進行最佳化調整！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

使用 Sentence Transformers 生成嵌入

EmbeddingGemma 是輕量級的開放式嵌入模型，專為行動電話等日常裝置設計，可快速擷取高品質內容。這款模型僅有 3.08 億個參數，因此效能十分出色，可直接在您的本機上執行檢索增強生成 (RAG) 等進階 AI 技術，且不需要網路連線。

設定

開始本教學課程前，請先完成下列步驟：

登入 Hugging Face，然後選取 Gemma 模型的「Acknowledge license」(確認授權)，即可存取 Gemma。
產生 Hugging Face 存取權杖，並使用該權杖從 Colab 登入。

這個筆記本會在 CPU 或 GPU 上執行。

安裝 Python 套件

安裝執行 EmbeddingGemma 模型及產生嵌入內容所需的程式庫。Sentence Transformers 是適用於文字和圖像嵌入的 Python 架構。詳情請參閱 Sentence Transformers 說明文件。

pip install -U sentence-transformers git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview

接受授權後，您需要有效的 Hugging Face 權杖才能存取模型。

# Login into Hugging Face Hub
from huggingface_hub import login
login()

載入模型

使用 sentence-transformers 程式庫，透過 EmbeddingGemma 建立模型類別的執行個體。

import torch
from sentence_transformers import SentenceTransformer

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "google/embeddinggemma-300M"
model = SentenceTransformer(model_id).to(device=device)

print(f"Device: {model.device}")
print(model)
print("Total number of parameters in the model:", sum([p.numel() for _, p in model.named_parameters()]))

Device: cuda:0
SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)
Total number of parameters in the model: 307581696

正在生成嵌入

嵌入是文字 (例如字詞或句子) 的數值表示法，可擷取語意。基本上，這是一組數字 (向量)，可讓電腦瞭解字詞的關係和脈絡。

我們來看看 EmbeddingGemma 如何處理三個不同的字詞 ["apple", "banana", "car"]。

EmbeddingGemma 經過大量文字訓練，已瞭解字詞和概念之間的關係。

words = ["apple", "banana", "car"]

# Calculate embeddings by calling model.encode()
embeddings = model.encode(words)

print(embeddings)
for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1} (shape): {embedding.shape}")

[[-0.18476306  0.00167681  0.03773484 ... -0.07996225 -0.02348064
   0.00976741]
 [-0.21189538 -0.02657359  0.02513712 ... -0.08042689 -0.01999852
   0.00512146]
 [-0.18924113 -0.02551468  0.04486253 ... -0.06377774 -0.03699806
   0.03973572]]
Embedding 1: (768,)
Embedding 2: (768,)
Embedding 3: (768,)

模型會為每個句子輸出數值向量。實際向量非常長 (768)，但為了簡化，這些向量會以幾個維度呈現。

重點不是個別數字本身，而是向量之間的距離。如果我們要在多維空間中繪製這些向量，apple 和 banana 的向量會非常接近。car 的向量會與其他兩個向量相距甚遠。

判斷相似度

在本節中，我們將使用嵌入來判斷不同句子在語意上的相似程度。以下範例顯示高、中和低相似度分數。

高相似度：
- 句子 A：「The chef prepared a delicious meal for the guests.」(主廚為賓客準備了美味餐點。)
- 句子 B：「A tasty dinner was cooked by the chef for the visitors.」(主廚為訪客烹調美味晚餐。)
- 理由：這兩句話描述的是同一件事，但使用的字詞和文法結構不同 (主動語態與被動語態)。兩者傳達的核心意義相同。
中等相似度：
- 句子 A：「她是機器學習專家。」
- 句子 B：「他對人工智慧深感興趣。」
- 推理：這兩句話相關，因為機器學習是人工智慧的子領域。不過，他們談論的對象不同，參與程度也不同 (專家與感興趣者)。
相似度偏低：
- 句子 A：「東京今天天氣晴朗。」
- 句子 B：「我需要購買這週的雜貨。」
- 原因：這兩句話的主題完全不相關，語意上也沒有重疊。

# The sentences to encode
sentence_high = [
    "The chef prepared a delicious meal for the guests.",
    "A tasty dinner was cooked by the chef for the visitors."
]
sentence_medium = [
    "She is an expert in machine learning.",
    "He has a deep interest in artificial intelligence."
]
sentence_low = [
    "The weather in Tokyo is sunny today.",
    "I need to buy groceries for the week."
]

for sentence in [sentence_high, sentence_medium, sentence_low]:
  print("🙋‍♂️")
  print(sentence)
  embeddings = model.encode(sentence)
  similarities = model.similarity(embeddings[0], embeddings[1])
  print("`-> 🤖 score: ", similarities.numpy()[0][0])

🙋‍♂️
['The chef prepared a delicious meal for the guests.', 'A tasty dinner was cooked by the chef for the visitors.']
`-> 🤖 score:  0.8002148
🙋‍♂️
['She is an expert in machine learning.', 'He has a deep interest in artificial intelligence.']
`-> 🤖 score:  0.45417833
🙋‍♂️
['The weather in Tokyo is sunny today.', 'I need to buy groceries for the week.']
`-> 🤖 score:  0.22262995

使用提示詞搭配 EmbeddingGemma

如要使用 EmbeddingGemma 生成最佳嵌入內容，請在輸入文字開頭加入「指示提示」或「工作」。這些提示會針對特定工作 (例如文件擷取或問答) 最佳化嵌入內容，並協助模型區分不同輸入類型，例如搜尋查詢與文件。

如何套用提示

您可以在推論期間套用提示，方法有三種。

使用 prompt 引數
將完整提示字串直接傳遞至 encode 方法。這樣一來，您就能精確控制。
```
embeddings = model.encode(
    sentence,
    prompt="task: sentence similarity | query: "
)
```
使用 prompt_name 引數
依名稱選取預先定義的提示。這些提示會從模型的設定或初始化期間載入。
```
embeddings = model.encode(sentence, prompt_name="STS")
```
使用預設提示
如果您未指定 prompt 或 prompt_name，系統會自動使用設為 default_prompt_name 的提示；如果未設定預設提示，則不會套用任何提示。
```
embeddings = model.encode(sentence)
```

print("Available tasks:")
for name, prefix in model.prompts.items():
  print(f" {name}: \"{prefix}\"")
print("-"*80)

for sentence in [sentence_high, sentence_medium, sentence_low]:
  print("🙋‍♂️")
  print(sentence)
  embeddings = model.encode(sentence, prompt_name="STS")
  similarities = model.similarity(embeddings[0], embeddings[1])
  print("`-> 🤖 score: ", similarities.numpy()[0][0])

Available tasks:
 query: "task: search result | query: "
 document: "title: none | text: "
 BitextMining: "task: search result | query: "
 Clustering: "task: clustering | query: "
 Classification: "task: classification | query: "
 InstructionRetrieval: "task: code retrieval | query: "
 MultilabelClassification: "task: classification | query: "
 PairClassification: "task: sentence similarity | query: "
 Reranking: "task: search result | query: "
 Retrieval: "task: search result | query: "
 Retrieval-query: "task: search result | query: "
 Retrieval-document: "title: none | text: "
 STS: "task: sentence similarity | query: "
 Summarization: "task: summarization | query: "
--------------------------------------------------------------------------------
🙋‍♂️
['The chef prepared a delicious meal for the guests.', 'A tasty dinner was cooked by the chef for the visitors.']
`-> 🤖 score:  0.9363755
🙋‍♂️
['She is an expert in machine learning.', 'He has a deep interest in artificial intelligence.']
`-> 🤖 score:  0.6425841
🙋‍♂️
['The weather in Tokyo is sunny today.', 'I need to buy groceries for the week.']
`-> 🤖 score:  0.38587403

用途：檢索增強生成 (RAG)

對於 RAG 系統，請使用下列 prompt_name 值，為查詢和文件建立專用嵌入：

查詢：使用 prompt_name="Retrieval-query"。

query_embedding = model.encode(
    "How do I use prompts with this model?",
    prompt_name="Retrieval-query"
)

文件：使用 prompt_name="Retrieval-document"。如要進一步改善文件嵌入，您也可以直接使用 prompt 引數加入標題：

附上標題：

doc_embedding = model.encode(
    "The document text...",
    prompt="title: Using Prompts in RAG | text: "
)

沒有標題：

doc_embedding = model.encode(
    "The document text...",
    prompt="title: none | text: "
)

分類

分類工作是將一段文字指派給一或多個預先定義的類別或標籤。這是自然語言處理 (NLP) 最基本的工作之一。

文字分類的實用應用是客戶服務單轉送。這項程序會自動將顧客查詢轉送至正確部門，節省時間並減少手動作業。

labels = ["Billing Issue", "Technical Support", "Sales Inquiry"]

sentence = [
  "Excuse me, the app freezes on the login screen. It won't work even when I try to reset my password.",
  "I would like to inquire about your enterprise plan pricing and features for a team of 50 people.",
]

# Calculate embeddings by calling model.encode()
label_embeddings = model.encode(labels, prompt_name="Classification")
embeddings = model.encode(sentence, prompt_name="Classification")

# Calculate the embedding similarities
similarities = model.similarity(embeddings, label_embeddings)
print(similarities)

idx = similarities.argmax(1)
print(idx)

for example in sentence:
  print("🙋‍♂️", example, "-> 🤖", labels[idx[sentence.index(example)]])

tensor([[0.4673, 0.5145, 0.3604],
        [0.4191, 0.5010, 0.5966]])
tensor([1, 2])
🙋‍♂️ Excuse me, the app freezes on the login screen. It won't work even when I try to reset my password. -> 🤖 Technical Support
🙋‍♂️ I would like to inquire about your enterprise plan pricing and features for a team of 50 people. -> 🤖 Sales Inquiry

俄羅斯娃娃表徵學習 (MRL)

EmbeddingGemma 運用 MRL，從單一模型提供多種嵌入大小。這是一種聰明的訓練方法，可建立單一高品質的嵌入，將最重要的資訊集中在向量開頭。

也就是說，您只要取用完整嵌入內容的前 N 個維度，就能取得較小但仍非常實用的嵌入內容。儲存較小且經過截斷的嵌入內容成本較低，處理速度也較快，但這項效率的代價是嵌入內容的品質可能較低。您可以運用 MRL，根據應用程式的特定需求，選擇速度和準確度之間的最佳平衡點。

我們將使用三個字詞 ["apple", "banana", "car"] 建立簡化的嵌入，瞭解 MRL 的運作方式。

def check_word_similarities():
  # Calculate the embedding similarities
  print("similarity function: ", model.similarity_fn_name)
  similarities = model.similarity(embeddings[0], embeddings[1:])
  print(similarities)

  for idx, word in enumerate(words[1:]):
    print("🙋‍♂️ apple vs.", word, "-> 🤖 score: ", similarities.numpy()[0][idx])

# Calculate embeddings by calling model.encode()
embeddings = model.encode(words, prompt_name="STS")

check_word_similarities()

similarity function:  cosine
tensor([[0.7510, 0.6685]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.75102395
🙋‍♂️ apple vs. car -> 🤖 score:  0.6684626

現在，您不需要新模型，就能加快應用程式速度。只要將完整嵌入內容截斷為前 512 個維度即可。為獲得最佳結果，也建議設定 normalize_embeddings=True，將向量縮放為單位長度 1。

embeddings = model.encode(words, truncate_dim=512, normalize_embeddings=True)

for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1}: {embedding.shape}")

print("-"*80)
check_word_similarities()

Embedding 1: (512,)
Embedding 2: (512,)
Embedding 3: (512,)
--------------------------------------------------------------------------------
similarity function:  cosine
tensor([[0.7674, 0.7041]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.767427
🙋‍♂️ apple vs. car -> 🤖 score:  0.7040509

在極度受限的環境中，您可以進一步將嵌入內容縮短至僅 256 個維度。您也可以使用效率更高的點積，而非標準的餘弦相似度，計算相似度。

model = SentenceTransformer(model_id, truncate_dim=256, similarity_fn_name="dot").to(device=device)
embeddings = model.encode(words, prompt_name="STS", normalize_embeddings=True)

for idx, embedding in enumerate(embeddings):
  print(f"Embedding {idx+1}: {embedding.shape}")

print("-"*80)
check_word_similarities()

Embedding 1: (256,)
Embedding 2: (256,)
Embedding 3: (256,)
--------------------------------------------------------------------------------
similarity function:  dot
tensor([[0.7855, 0.7382]])
🙋‍♂️ apple vs. banana -> 🤖 score:  0.7854644
🙋‍♂️ apple vs. car -> 🤖 score:  0.7382126

摘要與後續步驟

您現在可以使用 EmbeddingGemma 和 Sentence Transformers 程式庫，生成高品質的文字嵌入。運用這些技能建構強大的功能，例如語意相似度、文字分類和檢索增強生成 (RAG) 系統，並持續探索 Gemma 模型可實現的用途。

接下來請參閱下列文件：

微調 EmbeddingGemma
Gemma 教戰手冊中的簡單 RAG 範例