本頁面由 Cloud Translation API 翻譯而成。

iOS 適用的 GPU 加速委派

使用圖形處理器 (GPU) 執行機器學習 (ML) 模型，可大幅提升模型效能，以及支援機器學習技術的應用程式使用者體驗。在 iOS 裝置上，您可以使用委派，啟用模型的 GPU 加速執行功能。委派代表可做為 TensorFlow Lite 的硬體驅動程式，可讓您在 GPU 處理器上執行模型的程式碼。

本頁面說明如何在 iOS 應用程式中為 TensorFlow Lite 模型啟用 GPU 加速功能。如要進一步瞭解如何使用 TensorFlow Lite 的 GPU 委派功能，包括最佳做法和進階技術，請參閱 GPU 委派頁面。

將 GPU 與 translateer API 搭配使用

TensorFlow Lite Understandinger API 提供一組一般用途 API，可用於建構機器學習應用程式。以下操作說明將引導您為 iOS 應用程式新增 GPU 支援。本指南假設您已經有 iOS 應用程式，可成功使用 TensorFlow Lite 執行機器學習模型。

修改 Podfile 以納入 GPU 支援

從 TensorFlow Lite 2.3.0 版本開始，為減少二進位檔的大小，Pod 委派作業會從 Pod 中排除。您可以藉由指定 TensorFlowLiteSwift Pod 的子規格來納入這些指令：

pod 'TensorFlowLiteSwift/Metal', '~> 0.0.1-nightly',

pod 'TensorFlowLiteSwift', '~> 0.0.1-nightly', :subspecs => ['Metal']

或者，如果您想使用 Objective-C (適用於 2.4.0 以上版本或 C API)，也可以使用 TensorFlowLiteObjC 或 TensorFlowLiteC。

初始化並使用 GPU 委派

GPU 委任搭配多種程式設計語言，即可搭配 TensorFlow Lite 等解器 API 使用。建議使用 Swift 和 Objective-C，但您也可以使用 C++ 和 C。如果使用 2.4 以下版本的 TensorFlow Lite，就必須使用 C。以下程式碼範例概述如何在各種語言中使用委派。

Swift

import TensorFlowLite

// Load model ...

// Initialize TensorFlow Lite interpreter with the GPU delegate.
let delegate = MetalDelegate()
if let interpreter = try Interpreter(modelPath: modelPath,
                                      delegates: [delegate]) {
  // Run inference ...
}

Objective-C

// Import module when using CocoaPods with module support
@import TFLTensorFlowLite;

// Or import following headers manually
#import "tensorflow/lite/objc/apis/TFLMetalDelegate.h"
#import "tensorflow/lite/objc/apis/TFLTensorFlowLite.h"

// Initialize GPU delegate
TFLMetalDelegate* metalDelegate = [[TFLMetalDelegate alloc] init];

// Initialize interpreter with model path and GPU delegate
TFLInterpreterOptions* options = [[TFLInterpreterOptions alloc] init];
NSError* error = nil;
TFLInterpreter* interpreter = [[TFLInterpreter alloc]
                                initWithModelPath:modelPath
                                          options:options
                                        delegates:@[ metalDelegate ]
                                            error:&error];
if (error != nil) { /* Error handling... */ }

if (![interpreter allocateTensorsWithError:&error]) { /* Error handling... */ }
if (error != nil) { /* Error handling... */ }

// Run inference ...

C++

// Set up interpreter.
auto model = FlatBufferModel::BuildFromFile(model_path);
if (!model) return false;
tflite::ops::builtin::BuiltinOpResolver op_resolver;
std::unique_ptr<Interpreter> interpreter;
InterpreterBuilder(*model, op_resolver)(&interpreter);

// Prepare GPU delegate.
auto* delegate = TFLGpuDelegateCreate(/*default options=*/nullptr);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

// Run inference.
WriteToInputTensor(interpreter->typed_input_tensor<float>(0));
if (interpreter->Invoke() != kTfLiteOk) return false;
ReadFromOutputTensor(interpreter->typed_output_tensor<float>(0));

// Clean up.
TFLGpuDelegateDelete(delegate);

C (2.4.0 之前)

#include "tensorflow/lite/c/c_api.h"
#include "tensorflow/lite/delegates/gpu/metal_delegate.h"

// Initialize model
TfLiteModel* model = TfLiteModelCreateFromFile(model_path);

// Initialize interpreter with GPU delegate
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
TfLiteDelegate* delegate = TFLGPUDelegateCreate(nil);  // default config
TfLiteInterpreterOptionsAddDelegate(options, metal_delegate);
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
TfLiteInterpreterOptionsDelete(options);

TfLiteInterpreterAllocateTensors(interpreter);

NSMutableData *input_data = [NSMutableData dataWithLength:input_size * sizeof(float)];
NSMutableData *output_data = [NSMutableData dataWithLength:output_size * sizeof(float)];
TfLiteTensor* input = TfLiteInterpreterGetInputTensor(interpreter, 0);
const TfLiteTensor* output = TfLiteInterpreterGetOutputTensor(interpreter, 0);

// Run inference
TfLiteTensorCopyFromBuffer(input, inputData.bytes, inputData.length);
TfLiteInterpreterInvoke(interpreter);
TfLiteTensorCopyToBuffer(output, outputData.mutableBytes, outputData.length);

// Clean up
TfLiteInterpreterDelete(interpreter);
TFLGpuDelegateDelete(metal_delegate);
TfLiteModelDelete(model);

GPU API 語言應用實例

2.4.0 之前的 TensorFlow Lite 版本只能使用 C API 做為 Objective-C。
C++ API 只有在您自行使用 bazel 或自行建構 TensorFlow Lite 時才能使用。C++ API 無法與 CocoaPods 搭配使用。
將 TensorFlow Lite 與 C++ 搭配使用時，請透過 TFLGpuDelegateCreate() 函式取得 GPU 委派，然後將其傳遞至 Interpreter::ModifyGraphWithDelegate()，而不要呼叫 Interpreter::AllocateTensors()。

使用發布模式進行建構及測試

使用適當的 Metal API 加速器設定變更為發布子版本，以獲得更優異的效能和最終測試。本節說明如何啟用發布子版本，並設定 Metal 加速功能的相關設定。

如何變更發布子版本：

依序選取「Product」>「Scheeme」>「Edit Scheme...」，然後選取「Run」，即可編輯建構設定。
在「Info」分頁中，將「Build Configuration」變更為「Release」，然後取消勾選「Debug debugging」。
按一下「Options」分頁標籤，然後將「GPU Frame Capture」變更為「Disabled」，並將「Metal API Validation」變更為「Disabled」。
請務必選取採用 64 位元架構上的僅限發布版本。在「Project navigator」>「tflite_camera_example > PROJECT > your_project_name > Build Settings」下方，將「Build Active Architecture Only」>「Release」設為「Yes」。

進階 GPU 支援

本節說明 iOS 適用的 GPU 委派進階功能，包括委派選項、輸入和輸出緩衝區，以及量化模型的使用方式。

iOS 的委派選項

GPU 委派的建構函式接受 Swift API、Objective-C API 和 C API 中的選項 struct。將 nullptr (C API) 或不傳遞任何項目 (Objective-C 和 Swift API) 會設定為預設選項 (詳情請參閱上方的「基本用法」範例)。

Swift

// THIS:
var options = MetalDelegate.Options()
options.isPrecisionLossAllowed = false
options.waitType = .passive
options.isQuantizationEnabled = true
let delegate = MetalDelegate(options: options)

// IS THE SAME AS THIS:
let delegate = MetalDelegate()

Objective-C

// THIS:
TFLMetalDelegateOptions* options = [[TFLMetalDelegateOptions alloc] init];
options.precisionLossAllowed = false;
options.waitType = TFLMetalDelegateThreadWaitTypePassive;
options.quantizationEnabled = true;

TFLMetalDelegate* delegate = [[TFLMetalDelegate alloc] initWithOptions:options];

// IS THE SAME AS THIS:
TFLMetalDelegate* delegate = [[TFLMetalDelegate alloc] init];

C

// THIS:
const TFLGpuDelegateOptions options = {
  .allow_precision_loss = false,
  .wait_type = TFLGpuDelegateWaitType::TFLGpuDelegateWaitTypePassive,
  .enable_quantization = true,
};

TfLiteDelegate* delegate = TFLGpuDelegateCreate(options);

// IS THE SAME AS THIS:
TfLiteDelegate* delegate = TFLGpuDelegateCreate(nullptr);

使用 C++ API 的輸入/輸出緩衝區

GPU 進行運算時，GPU 必須取得資料。這項要求通常意味著您必須執行記憶體副本。請盡可能避免讓資料跨越 CPU/GPU 記憶體邊界，因為這可能會佔用大量時間。一般而言，這類交叉式是不可避免，但在某些情況下，您可以省略其中一項。

如果網路輸入內容是已在 GPU 記憶體中載入的圖片 (例如包含相機動態饋給的 GPU 紋理)，則網路輸入內容便可保留在 GPU 記憶體中，完全不必進入 CPU 記憶體。同樣地，如果網路的輸出採用可轉譯圖片的形式 (例如圖片樣式轉移作業)，就可以直接在螢幕上顯示結果。

為達到最佳效能，TensorFlow Lite 可讓使用者直接讀取及寫入 TensorFlow 硬體緩衝區，並略過不可避免的記憶體副本。

假設圖片輸入位於 GPU 記憶體中，您必須先將其轉換為 Metal 的 MTLBuffer 物件。您可以使用 TFLGpuDelegateBindMetalBufferToTensor() 函式，將 TfLiteTensor 與使用者準備的 MTLBuffer 建立關聯。請注意，這個函式「必須」在 Interpreter::ModifyGraphWithDelegate() 之後呼叫。此外，推論輸出預設為從 GPU 記憶體複製到 CPU 記憶體。您可以在初始化期間呼叫 Interpreter::SetAllowBufferHandleOutput(true) 來關閉此行為。

C++

#include "tensorflow/lite/delegates/gpu/metal_delegate.h"
#include "tensorflow/lite/delegates/gpu/metal_delegate_internal.h"

// ...

// Prepare GPU delegate.
auto* delegate = TFLGpuDelegateCreate(nullptr);

if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

interpreter->SetAllowBufferHandleOutput(true);  // disable default gpu->cpu copy
if (!TFLGpuDelegateBindMetalBufferToTensor(
        delegate, interpreter->inputs()[0], user_provided_input_buffer)) {
  return false;
}
if (!TFLGpuDelegateBindMetalBufferToTensor(
        delegate, interpreter->outputs()[0], user_provided_output_buffer)) {
  return false;
}

// Run inference.
if (interpreter->Invoke() != kTfLiteOk) return false;

關閉預設行為後，如要將推論輸出內容從 GPU 記憶體複製到 CPU 記憶體，就必須對每個輸出張量明確呼叫 Interpreter::EnsureTensorDataIsReadable()。這種做法也適用於量化模型，但由於緩衝區已繫結至內部去量化緩衝區，因此您仍須使用 float32 大小的緩衝區搭配 float32 資料。

量化模型

iOS GPU 委派程式庫預設支援量化模型。您不必變更任何程式碼，即可在 GPU 委派中使用量化模型。下一節說明如何停用量化支援，以用於測試或實驗。

停用量化模型支援

下列程式碼說明如何停用對量化模型的支援。

Swift

    var options = MetalDelegate.Options()
    options.isQuantizationEnabled = false
    let delegate = MetalDelegate(options: options)

Objective-C

    TFLMetalDelegateOptions* options = [[TFLMetalDelegateOptions alloc] init];
    options.quantizationEnabled = false;

C

    TFLGpuDelegateOptions options = TFLGpuDelegateOptionsDefault();
    options.enable_quantization = false;

    TfLiteDelegate* delegate = TFLGpuDelegateCreate(options);

如要進一步瞭解如何使用 GPU 加速功能執行量化模型，請參閱 GPU 委派總覽。