隆重推出 LiteRT：Google 為裝置端 AI (舊稱 TensorFlow Lite) 打造的高效能執行階段。

本頁面由 Cloud Translation API 翻譯而成。

Python 適用的圖片嵌入指南

MediaPipe Image Embedder 工作可讓您將圖片資料轉換成數字完成機器學習相關影像處理工作，例如比較兩個圖片的相似度。這些指示說明如何使用 Image Embedder with Python.

進一步瞭解功能、模型和設定選項請參閱總覽。

程式碼範例

Image Embedder 的範例程式碼提供執行相關作業這個程式碼可協助您測試這項工作您就可以開始建立自己的圖片嵌入程式您可以查看、執行及修改圖片嵌入器範例程式碼只要使用網路瀏覽器搭配 Google Colab 即可。您可以查看這個例子 GitHub。

設定

本節說明設定開發環境的重要步驟，以及專門使用 Image Embedder 的程式碼專案。如需設定開發環境以使用 MediaPipe 工作，包括：平台版本需求，請參閱這份指南 Python。

套件

Image Embedder 工作會使用 mediapipe pip 套件。您可以安裝依附元件取代為：

$ python -m pip install mediapipe

匯入

匯入下列類別來存取 Image Embedder 工作函式：

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

型號

MediaPipe Image Embedder 工作需要與這項指令相容的已訓練模型工作。如要進一步瞭解圖片嵌入器可用的已訓練模型，請參閱：工作總覽的「模型」一節。

選取並下載模型，然後儲存至本機目錄。別擔心！您可以使用建議使用 MobileNetV3 模型

model_path = '/absolute/path/to/mobilenet_v3_small_075_224_embedder.tflite'

在 model_asset_path 參數中指定模型的路徑，如下所示：

base_options = BaseOptions(model_asset_path=model_path)

建立工作

您可以使用 create_from_options 函式建立工作。 create_from_options 函式接受設定選項，用來設定嵌入器只要設定成「自動重新啟動」和「在主機維護期間」選項即可如要進一步瞭解設定選項，請參閱設定總覽。

圖片嵌入器工作支援 3 種輸入資料類型：靜態圖片、影片檔案和即時影像串流選擇輸入資料類型對應的分頁標籤瞭解如何建立工作並執行推論

圖片

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
ImageEmbedder = mp.tasks.vision.ImageEmbedder
ImageEmbedderOptions = mp.tasks.vision.ImageEmbedderOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ImageEmbedderOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    quantize=True,
    running_mode=VisionRunningMode.IMAGE)

with ImageEmbedder.create_from_options(options) as embedder:
  # The embedder is initialized. Use it here.
  # ...

影片

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
ImageEmbedder = mp.tasks.vision.ImageEmbedder
ImageEmbedderOptions = mp.tasks.vision.ImageEmbedderOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ImageEmbedderOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    quantize=True,
    running_mode=VisionRunningMode.VIDEO)

with ImageEmbedder.create_from_options(options) as embedder:
  # The embedder is initialized. Use it here.
  # ...

直播

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
ImageEmbedderResult = mp.tasks.vision.ImageEmbedder.ImageEmbedderResult
ImageEmbedder = mp.tasks.vision.ImageEmbedder
ImageEmbedderOptions = mp.tasks.vision.ImageEmbedderOptions
VisionRunningMode = mp.tasks.vision.RunningMode

def print_result(result: ImageEmbedderResult, output_image: mp.Image, timestamp_ms: int):
    print('ImageEmbedderResult result: {}'.format(result))

options = ImageEmbedderOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    running_mode=VisionRunningMode.LIVE_STREAM,
    quantize=True,
    result_callback=print_result)

with ImageEmbedder.create_from_options(options) as embedder:
  # The embedder is initialized. Use it here.
  # ...

設定選項

這項工作有下列 Python 應用程式設定選項：

選項名稱	說明	值範圍	預設值
`running_mode`	設定任務的執行模式。在架構中模式：圖片：單一圖片輸入模式。 VIDEO：影片已解碼的影格模式。 LIVE_STREAM：輸入串流模式擷取的資訊等。在此模式下， resultListener 設定接聽程式來接收結果以非同步方式載入物件	{`IMAGE, VIDEO, LIVE_STREAM`}	`IMAGE`
`l2_normalize`	是否使用 L2 正規化將傳回的特徵向量正規化。只有在模型未包含原生參數的情況下，才能使用這個選項 L2_NORMALIZATION TFLite 運算。在大多數情況下，這是預設情況因此 L2 正規化會透過 TFLite 推論完成，完全不需這個選項。	`Boolean`	`False`
`quantize`	是否應透過以下方式，將傳回的嵌入量化為位元組：純量量化內嵌以隱含形式被假設為單位非因此，所有維度一定會有 [-1.0, 1.0] 的值。使用 l2_normalize 選項是如此。	`Boolean`	`False`
`result_callback`	設定結果監聽器來接收嵌入結果當 Image Embedder 出現在直播中時，以非同步方式模式。只有在執行模式設為「`LIVE_STREAM`」時才能使用	不適用	未設定

準備資料

備妥圖片檔案或 numpy 陣列的輸入內容，然後將其轉換為 mediapipe.Image 物件。如果您提供的是影片檔案或直播可以使用外部程式庫，例如 OpenCV，會將輸入影格載入為 numpy 陣列。

圖片

import mediapipe as mp

# Load the input image from an image file.
mp_image = mp.Image.create_from_file('/path/to/image')

# Load the input image from a numpy array.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_image)

影片

import mediapipe as mp

# Use OpenCV’s VideoCapture to load the input video.

# Load the frame rate of the video using OpenCV’s CV_CAP_PROP_FPS
# You’ll need it to calculate the timestamp for each frame.

# Loop through each frame in the video using VideoCapture#read()

# Convert the frame received from OpenCV to a MediaPipe’s Image object.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_frame_from_opencv)

直播

import mediapipe as mp

# Use OpenCV’s VideoCapture to start capturing from the webcam.

# Create a loop to read the latest frame from the camera using VideoCapture#read()

# Convert the frame received from OpenCV to a MediaPipe’s Image object.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_frame_from_opencv)

執行工作

您可以呼叫與執行中模式相對應的嵌入函式來觸發推論出Image Embedder API 會傳回輸入圖片或影格

圖片

# Perform image embedding on the provided single image.
embedding_result = embedder.embed(mp_image)

影片

# Calculate the timestamp of the current frame
frame_timestamp_ms = 1000 * frame_index / video_file_fps

# Perform image embedding on the video frame.
embedding_result = embedder.embed_for_video(mp_image, frame_timestamp_ms)

直播

# Send the latest frame to perform image embedding.
# Results are sent to the `result_callback` provided in the `ImageEmbedderOptions`.
embedder.embed_async(mp_image, frame_timestamp_ms)

注意事項：

以錄影模式或直播模式執行時，你也必須提供 Image Embedder 工作，做為輸入影格的時間戳記。
在圖片或影片模型中執行時，Image Embedder 工作會封鎖目前的執行緒，直到處理完成輸入圖片，相框。
以直播模式執行時，Image Embedder 工作不會封鎖但會立即傳回這會叫用結果則會在每次處理完輸入影格如果在圖片嵌入器時呼叫 embedAsync 函式工作正忙於處理另一個影格，工作會忽略新的輸入影格。

處理及顯示結果

執行推論時，Image Embedder 工作會傳回 ImageEmbedderResult 物件，其中包含物件中可能出現的類別清單輸入圖片或影格

以下範例顯示這項工作的輸出資料範例：

ImageEmbedderResult:
  Embedding #0 (sole embedding head):
    float_embedding: {0.0, 0.0, ..., 0.0, 1.0, 0.0, 0.0, 2.0}
    head_index: 0

此結果是嵌入下列圖片而得：

您可以使用 ImageEmbedder.cosine_similarity 函式。請參閱下列程式碼，瞭解範例。

# Compute cosine similarity.
similarity = ImageEmbedder.cosine_similarity(
  embedding_result.embeddings[0],
  other_embedding_result.embeddings[0])