隆重推出 Google AI Edge Portal：大规模对边缘 AI 进行基准测试。注册以在非公开预览期间申请访问权限。

此页面由 Cloud Translation API 翻译。

适用于 iOS 的图片嵌入指南

借助 MediaPipe 图片嵌入任务，您可以将图片数据转换为数字表示法，以执行与机器学习相关的图片处理任务，例如比较两张图片的相似性。

这些说明中介绍的代码示例可在 GitHub 上找到。您可以查看此Web 演示，了解此任务的实际运作方式。如需详细了解此任务的功能、模型和配置选项，请参阅概览。

代码示例

MediaPipe Tasks 示例代码是对适用于 iOS 的图片嵌入器应用的基本实现。该示例使用实体 iOS 设备上的相机持续嵌入图片，还可以在设备图库中的图片文件上运行嵌入器。

您可以将该应用用作您自己的 iOS 应用的起点，也可以在修改现有应用时参考该应用。图片嵌入器示例代码托管在 GitHub 上。

下载代码

以下说明介绍了如何使用 git 命令行工具创建示例代码的本地副本。

如需下载示例代码，请执行以下操作：

使用以下命令克隆 Git 代码库：

git clone https://github.com/google-ai-edge/mediapipe-samples

（可选）将您的 Git 实例配置为使用稀疏检出，以便您只保留 Image Embedder 示例应用的文件：
```
cd mediapipe-samples
git sparse-checkout init --cone
git sparse-checkout set examples/image_embedder/ios
```

创建示例代码的本地版本后，您可以安装 MediaPipe 任务库，使用 Xcode 打开项目并运行应用。如需了解相关说明，请参阅适用于 iOS 的设置指南。

关键组件

以下文件包含 Image Embedder 示例应用的重要代码：

ImageEmbedderService.swift：初始化图片嵌入器、处理模型选择，并对输入数据运行推理。
CameraViewController.swift：为实时摄像头画面输入模式实现界面，并直观呈现结果。
MediaLibraryViewController.swift：为静态图片输入模式实现界面并直观呈现结果。

设置

本部分介绍了设置开发环境和代码项目以使用图片嵌入工具的关键步骤。如需了解如何设置开发环境以使用 MediaPipe 任务（包括平台版本要求）的一般信息，请参阅适用于 iOS 的设置指南。

依赖项

图片嵌入器使用 MediaPipeTasksVision 库，必须使用 CocoaPods 进行安装。该库与 Swift 和 Objective-C 应用兼容，并且无需任何额外的语言专用设置。

如需了解如何在 macOS 上安装 CocoaPods，请参阅 CocoaPods 安装指南。如需了解如何创建包含应用所需 pod 的 Podfile，请参阅使用 CocoaPods。

使用以下代码在 Podfile 中添加 MediaPipeTasksVision pod：

target 'MyImageEmbedderApp' do
  use_frameworks!
  pod 'MediaPipeTasksVision'
end

如果您的应用包含单元测试目标，请参阅 iOS 设置指南，详细了解如何设置 Podfile。

型号

MediaPipe 图片嵌入任务需要与此任务兼容的训练模型。如需详细了解适用于图片嵌入器的可用训练模型，请参阅“模型”部分。

选择并下载模型，然后使用 Xcode 将其添加到项目目录中。如需了解如何向 Xcode 项目添加文件，请参阅管理 Xcode 项目中的文件和文件夹。

使用 BaseOptions.modelAssetPath 属性指定 app bundle 中的模型路径。

创建任务

您可以通过调用其某个初始化程序来创建图片嵌入程序任务。ImageEmbedder(options:) 初始化程序接受配置选项的值。

如果您不需要使用自定义配置选项初始化图片嵌入器，可以使用 ImageEmbedder(modelPath:) 初始化程序使用默认选项创建图片嵌入器。如需详细了解配置选项，请参阅配置概览。

图片嵌入任务支持 3 种输入数据类型：静态图片、视频文件和实时视频直播。默认情况下，ImageEmbedder(modelPath:) 会为静态图片初始化任务。如果您希望任务初始化以处理视频文件或实时视频串流，请使用 ImageEmbedder(options:) 指定视频或直播运行模式。直播模式还需要额外的 imageEmbedderLiveStreamDelegate 配置选项，以便图片嵌入程序能够异步将图片嵌入结果传递给代理。

选择与您的运行模式对应的标签页，了解如何创建任务并运行推理。

Swift

Image

import MediaPipeTasksVision

let modelPath = Bundle.main.path(
  forResource: "model",
  ofType: "tflite")

let options = ImageEmbedderOptions()
options.baseOptions.modelAssetPath = modelPath
options.quantize = true
options.l2Normalize = true

let imageEmbedder = try ImageEmbedder(options: options)

视频

import MediaPipeTasksVision

let modelPath = Bundle.main.path(
  forResource: "model",
  ofType: "tflite")

let options = ImageEmbedderOptions()
options.baseOptions.modelAssetPath = modelPath
options.runningMode = .video
options.quantize = true
options.l2Normalize = true

let imageEmbedder = try ImageEmbedder(options: options)

直播

import MediaPipeTasksVision

// Class that conforms to the `ImageEmbedderLiveStreamDelegate` protocol and
// implements the method that the image embedder calls once it finishes
// embedding each input frame.
class ImageEmbedderResultProcessor: NSObject, ImageEmbedderLiveStreamDelegate {

  func imageEmbedder(
    _ imageEmbedder: ImageEmbedder,
    didFinishEmbedding result: ImageEmbedderResult?,
    timestampInMilliseconds: Int,
    error: Error?) {

    // Process the image embedder result or errors here.

  }
}

let modelPath = Bundle.main.path(
  forResource: "model",
  ofType: "tflite")

let options = ImageEmbedderOptions()
options.baseOptions.modelAssetPath = modelPath
options.runningMode = .liveStream
options.quantize = true
options.l2Normalize = true

// Assign an object of the class to the `imageEmbedderLiveStreamDelegate`
// property.
let processor = ImageEmbedderResultProcessor()
options.imageEmbedderLiveStreamDelegate = processor

let imageEmbedder = try ImageEmbedder(options: options)

Objective-C

Image

@import MediaPipeTasksVision;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];

MPPImageEmbedderOptions *options = [[MPPImageEmbedderOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeImage;
options.quantize = YES;
options.l2Normalize = YES;

MPPImageEmbedder *imageEmbedder =
  [[MPPImageEmbedder alloc] initWithOptions:options error:nil];

视频

@import MediaPipeTasksVision;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];

MPPImageEmbedderOptions *options = [[MPPImageEmbedderOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeVideo;
options.quantize = YES;
options.l2Normalize = YES;

MPPImageEmbedder *imageEmbedder =
  [[MPPImageEmbedder alloc] initWithOptions:options error:nil];

直播

@import MediaPipeTasksVision;

// Class that conforms to the `MPPImageEmbedderLiveStreamDelegate` protocol
// and implements the method that the image embedder calls once it finishes
// embedding each input frame.
@interface APPImageEmbedderResultProcessor : NSObject 

@end

@implementation APPImageEmbedderResultProcessor

-   (void)imageEmbedder:(MPPImageEmbedder *)imageEmbedder
    didFinishEmbeddingWithResult:(MPPImageEmbedderResult *)imageEmbedderResult
         timestampInMilliseconds:(NSInteger)timestampInMilliseconds
                           error:(NSError *)error {

    // Process the image embedder result or errors here.

}

@end

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];

MPPImageEmbedderOptions *options = [[MPPImageEmbedderOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeLiveStream;
options.quantize = YES;
options.l2Normalize = YES;

// Assign an object of the class to the `imageEmbedderLiveStreamDelegate`
// property.
APPImageEmbedderResultProcessor *processor =
  [APPImageEmbedderResultProcessor new];
options.imageEmbedderLiveStreamDelegate = processor;

MPPImageEmbedder *imageEmbedder =
  [[MPPImageEmbedder alloc] initWithOptions:options error:nil];

配置选项

此任务针对 iOS 应用提供了以下配置选项：

选项名称	说明	值范围	默认值
`runningMode`	设置任务的运行模式。图片嵌入器有三种模式： IMAGE：适用于单个图片输入的模式。 VIDEO：视频解码帧的模式。 LIVE_STREAM：输入数据（例如来自摄像头）的直播模式。在此模式下，必须将 `imageEmbedderLiveStreamDelegate` 设置为实现 `ImageEmbedderLiveStreamDelegate` 的类的实例，以异步接收嵌入图片帧的结果。	{RunningMode.image, RunningMode.video, RunningMode.liveStream}	{RunningMode.image}
`l2Normalize`	是否使用 L2 范数对返回的特征向量进行归一化。仅当模型尚不包含原生 L2_NORMALIZATION TFLite 运算时，才应使用此选项。在大多数情况下，已经是这种情况，因此 L2 归一化是通过 TFLite 推理实现的，而无需此选项。	Bool	false
`quantize`	是否应通过标量量化将返回的嵌入量化为字节。系统会隐式假定嵌入的范数为 1，因此任何维度的值都保证在 [-1.0, 1.0] 之间。如果不是这种情况，请使用 l2Normalize 选项。	Bool	false

将运行模式设为直播时，图片嵌入程序需要额外的 imageEmbedderLiveStreamDelegate 配置选项，以便图片嵌入程序异步提供图片嵌入结果。代理必须实现 imageEmbedder(_:didFinishEmbedding:timestampInMilliseconds:error:) 方法，Image Embedder 会在处理每个输入图片帧的嵌入结果后调用该方法。

选项名称	说明	值范围	默认值
`imageEmbedderLiveStreamDelegate`	让图片嵌入程序能够在直播模式下异步接收图片嵌入结果。将实例设置为此属性的类必须实现 `imageEmbedder(_:didFinishEmbedding:timestampInMilliseconds:error:)` 方法。	不适用	未设置

准备数据

您需要先将输入图片或帧转换为 MPImage 对象，然后才能将其传递给图片嵌入器。MPImage 支持不同类型的 iOS 图片格式，并且可以在任何运行模式下使用这些格式进行推理。如需详细了解 MPImage，请参阅 MPImage API。

根据您的用例和应用所需的运行模式选择 iOS 图片格式。MPImage 接受 UIImage、CVPixelBuffer 和 CMSampleBuffer iOS 图片格式。

UIImage

UIImage 格式非常适合以下运行模式：

图片：应用 bundle、用户图库或文件系统中格式为 UIImage 图片的图片可以转换为 MPImage 对象。
视频：使用 AVAssetImageGenerator 将视频帧提取为 CGImage 格式，然后将其转换为 UIImage 图片。

Swift

// Load an image on the user's device as an iOS `UIImage` object.

// Convert the `UIImage` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(uiImage: image)

Objective-C

// Load an image on the user's device as an iOS `UIImage` object.

// Convert the `UIImage` object to a MediaPipe's Image object having the default
// orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithUIImage:image error:nil];

该示例使用默认的 UIImage.Orientation.Up 方向初始化 MPImage。您可以使用任何受支持的 UIImage.Orientation 值初始化 MPImage。图片嵌入器不支持镜像屏幕方向，例如 .upMirrored、.downMirrored、.leftMirrored、.rightMirrored。

如需详细了解 UIImage，请参阅 UIImage Apple 开发者文档。

CVPixelBuffer

CVPixelBuffer 格式非常适合生成帧并使用 iOS CoreImage 框架进行处理的应用。

CVPixelBuffer 格式非常适合以下运行模式：

图片：在图片运行模式下，使用 iOS 的 CoreImage 框架进行一些处理后生成 CVPixelBuffer 图片的应用可以发送到图片嵌入器。
视频：视频帧可以转换为 CVPixelBuffer 格式以进行处理，然后以视频模式发送到图片嵌入器。
直播：使用 iOS 相机生成帧的应用可能会先转换为 CVPixelBuffer 格式进行处理，然后再以直播模式发送到图片嵌入器。

Swift

// Obtain a CVPixelBuffer.

// Convert the `CVPixelBuffer` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(pixelBuffer: pixelBuffer)

Objective-C

// Obtain a CVPixelBuffer.

// Convert the `CVPixelBuffer` object to a MediaPipe's Image object having the
// default orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithUIImage:image error:nil];

如需详细了解 CVPixelBuffer，请参阅 CVPixelBuffer Apple 开发者文档。

CMSampleBuffer

CMSampleBuffer 格式用于存储统一媒体类型的媒体样本，非常适合直播运行模式。iOS 摄像头的实时帧由 iOS AVCaptureVideoDataOutput 以 CMSampleBuffer 格式异步传送。

Swift

// Obtain a CMSampleBuffer.

// Convert the `CMSampleBuffer` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(sampleBuffer: sampleBuffer)

Objective-C

// Obtain a `CMSampleBuffer`.

// Convert the `CMSampleBuffer` object to a MediaPipe's Image object having the
// default orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithSampleBuffer:sampleBuffer error:nil];

如需详细了解 CMSampleBuffer，请参阅 CMSampleBuffer Apple 开发者文档。

运行任务

如需运行图片嵌入器，请使用特定于分配的运行模式的 embed() 方法：

静态图片：embed(image:)
视频：embed(videoFrame:timestampInMilliseconds:)
直播：embedAsync(image:timestampInMilliseconds:)

以下代码示例展示了如何在这些不同运行模式下运行图片嵌入器的基本示例：

Swift

Image

let result = try imageEmbedder.embed(image: image)

视频

let result = try imageEmbedder.embed(
  videoFrame: image,
  timestampInMilliseconds: timestamp)

直播

try imageEmbedder.embedAsync(
  image: image,
  timestampInMilliseconds: timestamp)

Objective-C

Image

MPPImageEmbedderResult *result =
  [imageEmbedder embedImage:image error:nil];

视频

MPPImageEmbedderResult *result =
  [imageEmbedder embedVideoFrame:image
           timestampInMilliseconds:timestamp
                             error:nil];

直播

BOOL success =
  [imageEmbedder embedAsyncImage:image
           timestampInMilliseconds:timestamp
                             error:nil];

图片嵌入器代码示例详细展示了 embed(image:)、embed(videoFrame:timestampInMilliseconds:) 和 embedAsync(image:timestampInMilliseconds:) 这三种模式的实现。示例代码允许用户在处理模式之间切换，但您的用例可能不需要这样做。

请注意以下几点：

在视频模式或直播模式下运行时，您还必须向图片嵌入任务提供输入帧的时间戳。
在图片或视频模式下运行时，图片嵌入器任务会阻塞当前线程，直到其处理完输入图片或帧。为避免阻塞当前线程，请使用 iOS Dispatch 或 NSOperation 框架在后台线程中执行处理。如果您的应用是使用 Swift 创建的，您还可以使用 Swift 并发进行后台线程执行。
在直播模式下运行时，图片嵌入程序任务会立即返回，并且不会阻塞当前线程。它会在嵌入每个输入帧后，使用结果调用 imageEmbedder(_:didFinishEmbedding:timestampInMilliseconds:error:) 方法。图片嵌入程序会在专用串行调度队列上异步调用此方法。如需在界面上显示结果，请在处理结果后将结果分派到主队列。如果在图片嵌入程序任务忙于处理其他帧时调用 embedAsync 函数，图片嵌入程序会忽略新的输入帧。

处理和显示结果

运行推理后，图片嵌入器会返回一个 ImageEmbedderResult 对象，其中包含输入图片的嵌入（浮点或标量量化）列表。

以下是此任务的输出数据示例：

ImageEmbedderResult:
  Embedding #0 (sole embedding head):
    float_embedding: {0.0, 0.0, ..., 0.0, 1.0, 0.0, 0.0, 2.0}
    head_index: 0

此结果是通过嵌入以下图片获得的：

异国猫的中景镜头

您可以使用 ImageEmbedder.cosineSimilarity 函数比较两个嵌入的类似程度。

Swift

let similarity = try ImageEmbedder.cosineSimilarity(
  embedding1: result.embeddingResult.embeddings[0],
  embedding2: otherResult.embeddingResult.embeddings[0])

Objective-C

NSNumber *similarity = [MPPImageEmbedder
      cosineSimilarityBetweenEmbedding1:result.embeddingResult.embeddings[0]
                          andEmbedding2:otherResult.embeddingResult.embeddings[0]
                                  error:nil];