全新推出：LiteRT：Google 专为设备端 AI 打造的高性能运行时，以前称为 TensorFlow Lite。

此页面由 Cloud Translation API 翻译。

适用于 iOS 的 GPU 加速代理

使用图形处理单元 (GPU) 运行机器学习 (ML) 模型可以显著提升模型的性能和用户体验支持机器学习的应用。在 iOS 设备上，您可以启用使用 GPU 加速模型执行委托。委托充当 LiteRT，允许您在 GPU 处理器上运行模型的代码。

本页介绍如何为 LiteRT 模型启用 GPU 加速，具体代码如下： iOS 应用。如需详细了解如何使用适用于 LiteRT 的 GPU 委托，包括最佳做法和高级方法，请参阅 GPU 委托人页面。

将 GPU 与 Interpreter API 搭配使用

LiteRT 解释器 API提供了一系列多种用途的 API，用于构建机器学习应用。以下相关说明指导您向 iOS 应用添加 GPU 支持。本指南假设您已经有可以成功执行机器学习模型的 iOS 应用。

修改 Podfile 以包含 GPU 支持

从 LiteRT 2.3.0 版本开始，不包括 GPU 代理以缩减二进制文件的大小您可以通过指定 TensorFlowLiteSwift pod 的 subspec：

pod 'TensorFlowLiteSwift/Metal', '~> 0.0.1-nightly',

或

pod 'TensorFlowLiteSwift', '~> 0.0.1-nightly', :subspecs => ['Metal']

如果您想使用TensorFlowLiteObjCTensorFlowLiteC Objective-C（适用于 2.4.0 及更高版本）或 C API。

初始化和使用 GPU 代理

您可以将 GPU 委托与 LiteRT 解释器搭配使用 API，其中包含多种编程方式，语言。建议使用 Swift 和 Objective-C，但您也可以使用 C++ 和的如果您使用的是旧版 LiteRT，则必须使用 C 高于 2.4。以下代码示例概述了如何将委托与每个这些语言中的一种。

SwiftObjective-CC++C（2.4.0 之前版本）

import TensorFlowLite

// Load model ...

// Initialize LiteRT interpreter with the GPU delegate.
let delegate = MetalDelegate()
if let interpreter = try Interpreter(modelPath: modelPath,
                                      delegates: [delegate]) {
  // Run inference ...
}

// Import module when using CocoaPods with module support
@import TFLTensorFlowLite;

// Or import following headers manually
#import "tensorflow/lite/objc/apis/TFLMetalDelegate.h"
#import "tensorflow/lite/objc/apis/TFLTensorFlowLite.h"

// Initialize GPU delegate
TFLMetalDelegate* metalDelegate = [[TFLMetalDelegate alloc] init];

// Initialize interpreter with model path and GPU delegate
TFLInterpreterOptions* options = [[TFLInterpreterOptions alloc] init];
NSError* error = nil;
TFLInterpreter* interpreter = [[TFLInterpreter alloc]
                                initWithModelPath:modelPath
                                          options:options
                                        delegates:@[ metalDelegate ]
                                            error:&error];
if (error != nil) { /* Error handling... */ }

if (![interpreter allocateTensorsWithError:&error]) { /* Error handling... */ }
if (error != nil) { /* Error handling... */ }

// Run inference ...

// Set up interpreter.
auto model = FlatBufferModel::BuildFromFile(model_path);
if (!model) return false;
tflite::ops::builtin::BuiltinOpResolver op_resolver;
std::unique_ptr<Interpreter> interpreter;
InterpreterBuilder(*model, op_resolver)(&interpreter);

// Prepare GPU delegate.
auto* delegate = TFLGpuDelegateCreate(/*default options=*/nullptr);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

// Run inference.
WriteToInputTensor(interpreter->typed_input_tensor<float>(0));
if (interpreter->Invoke() != kTfLiteOk) return false;
ReadFromOutputTensor(interpreter->typed_output_tensor<float>(0));

// Clean up.
TFLGpuDelegateDelete(delegate);

#include "tensorflow/lite/c/c_api.h"
#include "tensorflow/lite/delegates/gpu/metal_delegate.h"

// Initialize model
TfLiteModel* model = TfLiteModelCreateFromFile(model_path);

// Initialize interpreter with GPU delegate
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
TfLiteDelegate* delegate = TFLGPUDelegateCreate(nil);  // default config
TfLiteInterpreterOptionsAddDelegate(options, metal_delegate);
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
TfLiteInterpreterOptionsDelete(options);

TfLiteInterpreterAllocateTensors(interpreter);

NSMutableData *input_data = [NSMutableData dataWithLength:input_size * sizeof(float)];
NSMutableData *output_data = [NSMutableData dataWithLength:output_size * sizeof(float)];
TfLiteTensor* input = TfLiteInterpreterGetInputTensor(interpreter, 0);
const TfLiteTensor* output = TfLiteInterpreterGetOutputTensor(interpreter, 0);

// Run inference
TfLiteTensorCopyFromBuffer(input, inputData.bytes, inputData.length);
TfLiteInterpreterInvoke(interpreter);
TfLiteTensorCopyToBuffer(output, outputData.mutableBytes, outputData.length);

// Clean up
TfLiteInterpreterDelete(interpreter);
TFLGpuDelegateDelete(metal_delegate);
TfLiteModelDelete(model);

GPU API 语言使用说明

2.4.0 之前的 LiteRT 版本只能将 C API 用于 Objective-C。
仅当您使用 bazel 或构建 TensorFlow 时，才能使用 C++ API 自行选择。C++ API 不能与 CocoaPods 一起使用。
将 LiteRT 与 GPU 委托 (C++) 结合使用时，获取 GPU 通过 TFLGpuDelegateCreate() 函数进行委托，然后将其传递给 Interpreter::ModifyGraphWithDelegate()，而不是调用 Interpreter::AllocateTensors()。

使用发布模式进行构建和测试

更改为具有适当 Metal API 加速器设置的发布 build，以获得更好的性能并用于最终测试。本部分介绍了如何启用发布 build 并为 Metal 加速配置设置。

如需更改为发布 build，请执行以下操作：

选择 Product >架构 >修改架构... 然后选择运行。
在信息标签页上，将构建配置更改为发布，然后取消选中调试可执行文件。
点击选项标签页，然后将 GPU Frame Capture 更改为 Disabled 并将 Metal API Validation 设为已停用。
请务必选择“基于 64 位架构的发布版 build”。低于 项目导航器 >tflite_camera_example>项目 >您的项目名称 > Build Settings 中依次选择 Build Active Architecture Only >发布至是。

高级 GPU 支持

本部分介绍了适用于 iOS 的 GPU 代理的高级用法，包括委托选项、输入和输出缓冲区，以及量化模型的使用。

适用于 iOS 的委托选项

GPU 代理的构造函数struct接受 Swift API， Objective-C API，和 C API。将 nullptr (C API) 或任何内容（Objective-C 和 Swift API）传递给初始化程序会设置默认选项（请参阅“基本用法” ）。

SwiftObjective-CC

// THIS:
var options = MetalDelegate.Options()
options.isPrecisionLossAllowed = false
options.waitType = .passive
options.isQuantizationEnabled = true
let delegate = MetalDelegate(options: options)

// IS THE SAME AS THIS:
let delegate = MetalDelegate()

// THIS:
TFLMetalDelegateOptions* options = [[TFLMetalDelegateOptions alloc] init];
options.precisionLossAllowed = false;
options.waitType = TFLMetalDelegateThreadWaitTypePassive;
options.quantizationEnabled = true;

TFLMetalDelegate* delegate = [[TFLMetalDelegate alloc] initWithOptions:options];

// IS THE SAME AS THIS:
TFLMetalDelegate* delegate = [[TFLMetalDelegate alloc] init];

// THIS:
const TFLGpuDelegateOptions options = {
  .allow_precision_loss = false,
  .wait_type = TFLGpuDelegateWaitType::TFLGpuDelegateWaitTypePassive,
  .enable_quantization = true,
};

TfLiteDelegate* delegate = TFLGpuDelegateCreate(options);

// IS THE SAME AS THIS:
TfLiteDelegate* delegate = TFLGpuDelegateCreate(nullptr);

使用 C++ API 的输入/输出缓冲区

在 GPU 上进行计算需要 GPU 能够使用数据。这个通常意味着您必须执行内存复制。您应该避免将尽可能让数据跨越 CPU/GPU 内存边界，大量的时间。通常情况下，这样的交叉路口是不可避免的，则可以省略其中一项。

如果网络的输入是已加载到 GPU 内存中的图片（例如包含相机画面的 GPU 纹理）可以保留在 GPU 内存中而无需进入 CPU 内存同样，如果网络的输出为可渲染图片的形式，例如图片样式转移可以直接在屏幕上显示结果。

为了实现最佳性能，LitRT 使用户可以直接在 TensorFlow 硬件缓冲区中读取和写入数据，可避免的内存复制

假设图像输入位于 GPU 内存中，则必须先将其转换为 Metal 的 MTLBuffer 对象。您可以将 TfLiteTensor 与用户准备的 MTLBuffer，其中包含 TFLGpuDelegateBindMetalBufferToTensor() 函数。请注意，在调用此函数之后必须 Interpreter::ModifyGraphWithDelegate()。此外，推理输出为默认从 GPU 内存复制到 CPU 内存。您可以关闭此行为调用 Interpreter::SetAllowBufferHandleOutput(true) 初始化。

C++

#include "tensorflow/lite/delegates/gpu/metal_delegate.h"
#include "tensorflow/lite/delegates/gpu/metal_delegate_internal.h"

// ...

// Prepare GPU delegate.
auto* delegate = TFLGpuDelegateCreate(nullptr);

if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

interpreter->SetAllowBufferHandleOutput(true);  // disable default gpu->cpu copy
if (!TFLGpuDelegateBindMetalBufferToTensor(
        delegate, interpreter->inputs()[0], user_provided_input_buffer)) {
  return false;
}
if (!TFLGpuDelegateBindMetalBufferToTensor(
        delegate, interpreter->outputs()[0], user_provided_output_buffer)) {
  return false;
}

// Run inference.
if (interpreter->Invoke() != kTfLiteOk) return false;

关闭默认行为后，从 GPU 复制推理输出内存到 CPU 内存需要显式调用 Interpreter::EnsureTensorDataIsReadable()。这个方法也适用于量化模型，但仍需使用 包含 float32 数据的 float32 大小的缓冲区，因为该缓冲区绑定到内部反量化缓冲区。

量化模型

iOS GPU 委托库默认支持量化模型。您不需要对代码进行任何更改，才能将量化模型与 GPU 委托搭配使用。通过以下部分介绍了如何为测试或实验目的。

停用量化模型支持

以下代码展示了如何停用对量化模型的支持。

SwiftObjective-CC

    var options = MetalDelegate.Options()
    options.isQuantizationEnabled = false
    let delegate = MetalDelegate(options: options)

    TFLMetalDelegateOptions* options = [[TFLMetalDelegateOptions alloc] init];
    options.quantizationEnabled = false;

    TFLGpuDelegateOptions options = TFLGpuDelegateOptionsDefault();
    options.enable_quantization = false;

    TfLiteDelegate* delegate = TFLGpuDelegateCreate(options);

如需详细了解如何通过 GPU 加速运行量化模型，请参阅 GPU 代理概览。