全新推出：LiteRT：Google 专为设备端 AI 打造的高性能运行时，以前称为 TensorFlow Lite。

此页面由 Cloud Translation API 翻译。

LiteRT 使用入门

本指南介绍如何在设备端运行 LiteRT 模型根据输入数据进行预测。这通过使用解释器，使用静态图排序和自定义（低动态）内存分配器来确保将负载、初始化和执行延迟降至最低。

LiteRT 推理通常遵循以下步骤：

加载模型：将 .tflite 模型加载到内存中，其中包含模型的执行图。
转换数据：将输入数据转换为预期格式并维度。模型的原始输入数据通常与输入不匹配模型所需的数据格式。例如，您可能需要调整图片，或更改图片格式以便与模型兼容。
运行推理：执行 LiteRT 模型以进行预测。这个这个步骤涉及使用 LiteRT API 执行模型。它涉及例如构建解释器和分配张量等步骤。
解释输出：以有意义的方式解释输出张量对您的应用非常有用例如，一个模型可能只返回概率列表。您可以自行将概率类别并设置输出的格式。

本指南介绍了如何访问 LiteRT 解释器并执行使用 C++、Java 和 Python 进行推理。

支持的平台

TensorFlow 推理 API 适用于最常见的移动设备和嵌入式设备 Android、iOS 和 Linux 等平台，多种编程语言。

在大多数情况下，API 设计反映的是性能优于易用性，。LiteRT 专为在小型设备上快速推理而设计，因此 API 可以避免不必要地复制不必要的内容。

在所有库中，使用 LiteRT API，您可以加载模型、馈送输入和检索推理输出。

Android 平台

在 Android 上，可以使用 Java 或 C++ API 执行 LiteRT 推理。通过 Java API 提供了便利，可以直接在 Android 应用中使用 activity 类。C++ API 提供了更高的灵活性和速度，但可能需要编写 JNI 封装容器以在 Java 层和 C++ 层之间移动数据。

如需了解详情，请参阅 C++ 和 Java 部分；或者按照 Android 快速入门操作。

iOS 平台

在 iOS 设备上，LitRT 适用于 Swift 和 Objective-C iOS 库。您也可以使用 C API 编写代码。

请参阅 Swift、Objective-C 和 C API 部分或按照 iOS 快速入门进行操作。

Linux 平台

在 Linux 平台上，您可以使用 C++。

加载并运行模型

加载和运行 LiteRT 模型涉及以下步骤：

将模型加载到内存中。
基于现有模型构建 Interpreter。
设置输入张量值。
调用推理。
输出张量值。

Android (Java)

使用 LiteRT 运行推理的 Java API 主要用于因此它可作为 Android 库依赖项使用： com.google.ai.edge.litert。

在 Java 中，您将使用 Interpreter 类加载模型并驱动模型推理。在许多情况下，这可能是您所需的唯一 API。

您可以使用 FlatBuffers (.tflite) 文件初始化 Interpreter：

public Interpreter(@NotNull File modelFile);

或者，使用 MappedByteBuffer：

public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);

在这两种情况下，您都必须提供有效的 LiteRT 模型，否则 API 会抛出 IllegalArgumentException。如果您使用 MappedByteBuffer 初始化 Interpreter，则它必须在 Interpreter。

在模型上运行推断的首选方法是使用签名 - 可用适用于从 TensorFlow 2.5 开始转换的模型

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("input_1", input1);
  inputs.put("input_2", input2);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("output_1", output1);
  interpreter.runSignature(inputs, outputs, "mySignature");
}

runSignature 方法采用三个参数：

输入：将输入从签名中的输入名称映射到输入对象。
输出：从签名中的输出名称到输出的输出映射的映射数据。
签名名称（选填）：签名名称（如果将具有单一签名）。

当模型未定义签名时，另一种运行推理的方法。只需调用 Interpreter.run() 即可。例如：

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

run() 方法仅接受一个输入且仅返回一个输出。因此，如果您的模型具有多个输入或多个输出，请改用：

interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);

在本例中，inputs 中的每个条目都对应一个输入张量， map_of_indices_to_outputs 将输出张量的索引映射到相应的输出数据。

在这两种情况下，张量索引都应与您提供给在创建模型时访问 LiteRT Converter。请注意 input 中的张量顺序必须与提供给 LiteRT 的顺序一致转换器。

Interpreter 类还提供了便捷函数，可用于获取使用操作名称的任何模型输入或输出的索引：

public int getInputIndex(String opName);
public int getOutputIndex(String opName);

如果 opName 不是模型中的有效操作，则会抛出 IllegalArgumentException。

另请注意，Interpreter 拥有资源。为了避免内存泄漏，以下资源必须在使用后释放：

interpreter.close();

如需查看使用 Java 的示例项目，请参阅 Android 对象检测示例 app。

支持的数据类型

要使用 LiteRT，输入和输出张量的数据类型必须为以下基元类型：

float
int
long
byte

String 类型也受支持，但它们的编码方式与基元类型。特别是，字符串张量的形状决定了张量中字符串的排列，其中每个元素本身都是一个可变长度的字符串。从这个意义上讲，张量的（字节）大小不能仅根据形状和类型计算得出，因此字符串不能以单个平面 ByteBuffer 参数的形式提供。

如果使用了其他数据类型，包括 Integer 和 Float 等盒装类型，则会抛出 IllegalArgumentException。

输入

每个输入都应该是受支持形式的数组或多维数组基元类型，或适当大小的原始 ByteBuffer。如果输入数组或多维数组，关联的输入张量将是在推理时隐式地将大小调整为数组的维度。如果输入 ByteBuffer，调用方应首先手动调整关联输入的大小张量（通过 Interpreter.resizeInput()）。

使用 ByteBuffer 时，请优先使用直接字节缓冲区，因为这样一来， Interpreter，以避免不必要的复制。如果 ByteBuffer 是直接字节缓冲区，其顺序必须为 ByteOrder.nativeOrder()。在使用模型推断，在模型推断完成之前必须保持不变。

Outputs

每个输出都应是一个受支持字符的数组或多维数组。或适当大小的 ByteBuffer 对象。请注意，有些模型具有动态输出，其中输出张量的形状可能会因输入。对于现有的 Java 推理 API，但通过计划中的扩展可以实现。

iOS (Swift)

Swift API 可在 Cocoapods 的 TensorFlowLiteSwift Pod 中找到。

首先，您需要导入 TensorFlowLite 模块。

import TensorFlowLite

// Getting model path
guard
  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
  // Error handling...
}

do {
  // Initialize an interpreter with the model.
  let interpreter = try Interpreter(modelPath: modelPath)

  // Allocate memory for the model's input `Tensor`s.
  try interpreter.allocateTensors()

  let inputData: Data  // Should be initialized

  // input data preparation...

  // Copy the input data to the input `Tensor`.
  try self.interpreter.copy(inputData, toInputAt: 0)

  // Run inference by invoking the `Interpreter`.
  try self.interpreter.invoke()

  // Get the output `Tensor`
  let outputTensor = try self.interpreter.output(at: 0)

  // Copy output to `Data` to process the inference results.
  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
  let outputData =
        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
  outputTensor.data.copyBytes(to: outputData)

  if (error != nil) { /* Error handling... */ }
} catch error {
  // Error handling...
}

iOS (Objective-C)

Objective-C API 可在 Cocoapods 的 LiteRTObjC Pod 中找到。

首先，您需要导入 TensorFlowLiteObjC 模块。

@import TensorFlowLite;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];
NSError *error;

// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
                                                                  error:&error];
if (error != nil) { /* Error handling... */ }

// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }

NSMutableData *inputData;  // Should be initialized
// input data preparation...

// Get the input `TFLTensor`
TFLTensor *inputTensor = [interpreter inputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy the input data to the input `TFLTensor`.
[inputTensor copyData:inputData error:&error];
if (error != nil) { /* Error handling... */ }

// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }

// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&error];
if (error != nil) { /* Error handling... */ }

Objective-C 代码中的 C API

Objective-C API 不支持委托。要使用受托人，请执行以下操作： Objective-C 代码，您需要直接调用底层 C API。

#include "tensorflow/lite/c/c_api.h"

TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();

// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);

// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
    TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                           input.size() * sizeof(float));

// Execute inference.
TfLiteInterpreterInvoke(interpreter);

// Extract the output tensor data.
const TfLiteTensor* output_tensor =
    TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                         output.size() * sizeof(float));

// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);

C++

使用 LiteRT 运行推理的 C++ API 与 Android、iOS、和 Linux 平台。只有在使用 bazel 时，iOS 上的 C++ API 才可用。

在 C++ 中，模型存储在 FlatBufferModel 类。它封装了一个 LiteRT 模型，您可以使用几种不同的具体采用哪种方式，具体取决于模型的存储位置：

class FlatBufferModel {
  // Build a model based on a file. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromFile(
      const char* filename,
      ErrorReporter* error_reporter);

  // Build a model based on a pre-loaded flatbuffer. The caller retains
  // ownership of the buffer and should keep it alive until the returned object
  // is destroyed. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
      const char* buffer,
      size_t buffer_size,
      ErrorReporter* error_reporter);
};

现在，您已将模型作为 FlatBufferModel 对象，可以执行该模型以及 Interpreter。一个 FlatBufferModel 可供多个应用同时使用 Interpreter。

以下代码段显示了 Interpreter API 的重要部分。请注意：

张量用整数表示，以避免字符串比较（以及对字符串库的任何固定依赖项）。
不得从并发线程访问解释器。
输入和输出张量的内存分配必须通过调用 AllocateTensors()。

将 LiteRT 与 C++ 配合使用，最简单的用法如下所示：

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if needed.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();

float* output = interpreter->typed_output_tensor<float>(0);

如需查看更多示例代码，请参阅 minimal.cc 和 label_image.cc。

Python

用于运行推理的 Python API 使用 Interpreter：用于加载模型；运行推理。

安装 LiteRT 软件包：

$ python3 -m pip install ai-edge-litert

导入 LiteRT 解释器

from ai_edge_litert.interpreter import Interpreter
Interpreter = Interpreter(model_path=args.model.file)

以下示例展示了如何使用 Python 解释器来 FlatBuffers (.tflite) 文件并使用随机输入数据进行推理：

如果您要从具有定义的 SignatureDef。

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()

  @tf.function(input_signature=[tf.TensorSpec(shape=[1, 10], dtype=tf.float32)])
  def add(self, x):
    '''
    Simple method that accepts single input 'x' and returns 'x' + 4.
    '''
    # Name the output 'result' for convenience.
    return {'result' : x + 4}

SAVED_MODEL_PATH = 'content/saved_models/test_variable'
TFLITE_FILE_PATH = 'content/test_variable.tflite'

# Save the model
module = TestModel()
# You can omit the signatures argument and a default signature name will be
# created with name 'serving_default'.
tf.saved_model.save(
    module, SAVED_MODEL_PATH,
    signatures={'my_signature':module.add.get_concrete_function()})

# Convert the model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()
with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])

另一个示例（如果模型未定义 SignatureDefs）。

import numpy as np
import tensorflow as tf

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

除了将模型作为预转换的 .tflite 文件加载外，您还可以您可以将代码与 LiteRT 结合使用编译器，这让您可以将 Keras 模型转换为 LiteRT 格式，然后运行推理：

import numpy as np
import tensorflow as tf

img = tf.keras.Input(shape=(64, 64, 3), name="img")
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.identity(val, name="out")

# Convert to LiteRT format
converter = tf.lite.TFLiteConverter.from_keras_model(tf.keras.models.Model(inputs=[img], outputs=[out]))
tflite_model = converter.convert()

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

# Continue to get tensors and so forth, as shown above...

如需更多 Python 示例代码，请参阅 label_image.py。

使用动态形状模型进行推理

如果要运行具有动态输入形状的模型，请调整输入形状的大小然后再进行推理。否则，Tensorflow 模型中的 None 形状将在 LiteRT 模型中替换为 1 的占位符。

以下示例展示了如何在运行前调整输入形状的大小进行推理。所有示例都假定输入形状定义为 [1/None, 10]，需要调整为 [3, 10]。

C++ 示例：

// Resize input tensors before allocate tensors
interpreter->ResizeInputTensor(/*tensor_index=*/0, std::vector<int>{3,10});
interpreter->AllocateTensors();

Python 示例：

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_path=TFLITE_FILE_PATH)

# Resize input shape for dynamic shape model and allocate tensor
interpreter.resize_tensor_input(interpreter.get_input_details()[0]['index'], [3, 10])
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()