Xin giới thiệu LiteRT: Môi trường thời gian chạy hiệu suất cao của Google dành cho AI trên thiết bị, trước đây là TensorFlow Lite.

Trang này được dịch bởi Cloud Translation API.

Bắt đầu sử dụng LiteRT

Hướng dẫn này giới thiệu cho bạn quy trình chạy mô hình LiteRT (viết tắt của Lite Runtime) trên thiết bị để đưa ra dự đoán dựa trên dữ liệu đầu vào. Điều này được thực hiện bằng trình thông dịch LiteRT, sử dụng thứ tự biểu đồ tĩnh và trình phân bổ bộ nhớ tuỳ chỉnh (ít động hơn) để đảm bảo tải, khởi chạy và độ trễ thực thi ở mức tối thiểu.

Quá trình suy luận LiteRT thường tuân theo các bước sau:

Tải mô hình: tải mô hình .tflite vào bộ nhớ, chứa biểu đồ thực thi của mô hình.
Chuyển đổi dữ liệu: Chuyển đổi dữ liệu đầu vào thành định dạng và kích thước dự kiến. Dữ liệu đầu vào thô của mô hình thường không khớp với dữ liệu đầu vào định dạng dữ liệu mà mô hình dự kiến. Ví dụ: bạn có thể cần phải đổi kích thước một hoặc thay đổi định dạng hình ảnh để tương thích với mô hình.
Chạy suy luận: Thực thi mô hình LiteRT để đưa ra dự đoán. Bước này liên quan đến việc sử dụng API LiteRT để thực thi mô hình. Việc này bao gồm một vài như xây dựng trình phiên dịch và phân bổ các tensor.
Diễn giải đầu ra: Diễn giải tensor đầu ra theo cách có ý nghĩa và hữu ích trong ứng dụng của bạn. Ví dụ: một mô hình có thể chỉ trả về một danh sách xác suất. Bạn có thể liên kết các xác suất với các danh mục liên quan và định dạng đầu ra.

Hướng dẫn này mô tả cách truy cập vào trình thông dịch LiteRT và thực hiện suy luận bằng C++, Java và Python.

Nền tảng được hỗ trợ

Chúng tôi cung cấp các API suy luận TensorFlow cho thiết bị di động và nền tảng được nhúng phổ biến nhất nền tảng như Android, iOS và Linux, ở nhiều ngôn ngữ lập trình.

Trong hầu hết các trường hợp, thiết kế API phản ánh ưu tiên hiệu suất hơn là dễ dàng sử dụng. LiteRT được thiết kế để suy luận nhanh trên các thiết bị nhỏ, vì vậy, các API tránh các bản sao không cần thiết mà vẫn đảm bảo sự tiện lợi.

Trên tất cả các thư viện, LiteRT API cho phép bạn tải mô hình, nguồn cấp dữ liệu đầu vào và truy xuất đầu ra suy luận.

Nền tảng Android

Trên Android, bạn có thể thực hiện suy luận LiteRT bằng cách sử dụng API Java hoặc C++. API Java mang lại sự tiện lợi và có thể được sử dụng trực tiếp trong các lớp Hoạt động Android. API C++ có tốc độ và tính linh hoạt cao hơn, nhưng có thể đòi hỏi viết trình bao bọc JNI để di chuyển dữ liệu giữa các lớp Java và C++.

Hãy xem phần C++ và Java để biết thêm thông tin, hoặc làm theo hướng dẫn Bắt đầu nhanh về Android.

Nền tảng iOS

Trên iOS, LiteRT có trong thư viện iOS Swift và Objective-C. Bạn cũng có thể sử dụng hàm C API trực tiếp trong mã Target-C.

Hãy xem Swift, Objective-C và C API hoặc làm theo phần bắt đầu nhanh với iOS.

Nền tảng Linux

Trên các nền tảng Linux, bạn có thể chạy dự đoán bằng cách sử dụng các API LiteRT có sẵn trong C++.

Tải và chạy mô hình

Việc tải và chạy mô hình LiteRT bao gồm các bước sau:

Tải mô hình vào bộ nhớ.
Tạo Interpreter dựa trên một mô hình hiện có.
Đặt giá trị tensor đầu vào.
Gọi ra suy luận.
Xuất giá trị tensor.

Android (Java)

API Java để chạy suy luận bằng LiteRT chủ yếu được thiết kế để sử dụng với Android, vì vậy, API này có sẵn dưới dạng phần phụ thuộc thư viện Android: com.google.ai.edge.litert.

Trong Java, bạn sẽ dùng lớp Interpreter để tải mô hình và điều khiển mô hình suy luận. Trong nhiều trường hợp, đây có thể là API duy nhất bạn cần.

Bạn có thể khởi chạy Interpreter bằng tệp FlatBuffers (.tflite):

public Interpreter(@NotNull File modelFile);

Hoặc bằng MappedByteBuffer:

public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);

Trong cả hai trường hợp, bạn phải cung cấp một mô hình LiteRT hợp lệ hoặc API sẽ gửi IllegalArgumentException. Nếu bạn sử dụng MappedByteBuffer để khởi tạo một Interpreter, thì mã này không được thay đổi trong toàn bộ thời gian hoạt động của Interpreter.

Cách ưu tiên để chạy suy luận trên một mô hình là sử dụng chữ ký – Có sẵn đối với các mô hình được chuyển đổi bắt đầu từ Tensorflow 2.5

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("input_1", input1);
  inputs.put("input_2", input2);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("output_1", output1);
  interpreter.runSignature(inputs, outputs, "mySignature");
}

Phương thức runSignature nhận 3 đối số:

Đầu vào : liên kết các dữ liệu đầu vào từ tên đầu vào trong chữ ký đến một dữ liệu đầu vào .
Đầu ra : ánh xạ liên kết đầu ra từ tên đầu ra trong chữ ký đến đầu ra .
Tên chữ ký (không bắt buộc): Tên chữ ký (Có thể để trống nếu mô hình có một chữ ký).

Một cách khác để chạy suy luận khi mô hình chưa xác định chữ ký. Bạn chỉ cần gọi Interpreter.run(). Ví dụ:

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

Phương thức run() chỉ nhận một dữ liệu đầu vào và chỉ trả về một dữ liệu đầu ra. Vì vậy, nếu mô hình có nhiều đầu vào hoặc nhiều đầu ra, thay vào đó hãy sử dụng:

interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);

Trong trường hợp này, mỗi mục trong inputs tương ứng với một tensor đầu vào và map_of_indices_to_outputs ánh xạ các chỉ số của tensor đầu ra đến các tensor tương ứng dữ liệu đầu ra.

Trong cả hai trường hợp, chỉ mục tensor phải tương ứng với các giá trị mà bạn đã cung cấp cho LiteRT Converter (Trình chuyển đổi LiteRT) khi tạo mô hình. Lưu ý rằng thứ tự của tensor trong input phải khớp với thứ tự được đặt cho LiteRT Trình chuyển đổi.

Lớp Interpreter cũng cung cấp các hàm thuận tiện để bạn lấy chỉ mục của bất kỳ đầu vào hoặc đầu ra mô hình nào bằng cách sử dụng tên toán tử:

public int getInputIndex(String opName);
public int getOutputIndex(String opName);

Nếu opName không phải là một hoạt động hợp lệ trong mô hình, nó sẽ gửi ra một IllegalArgumentException.

Ngoài ra, hãy lưu ý rằng Interpreter sở hữu tài nguyên. Để tránh rò rỉ bộ nhớ, bạn phải giải phóng tài nguyên sau khi sử dụng bằng cách:

interpreter.close();

Để tìm hiểu về một dự án mẫu sử dụng Java, hãy xem ví dụ về tính năng phát hiện đối tượng của Android ứng dụng.

Loại dữ liệu được hỗ trợ

Để sử dụng LiteRT, kiểu dữ liệu của tensor đầu vào và đầu ra phải là một trong các kiểu nguyên hàm sau:

float
int
long
byte

Các loại String cũng được hỗ trợ, nhưng chúng được mã hoá theo cách khác với các loại gốc. Cụ thể, hình dạng của Tensor chuỗi sẽ xác định số lượng và cách sắp xếp các chuỗi trong Tensor, trong đó mỗi phần tử là một chuỗi có độ dài biến đổi. Theo nghĩa này, kích thước (byte) của Tensor không thể được tính toán chỉ từ hình dạng và loại, do đó, không thể cung cấp chuỗi dưới dạng một đối số ByteBuffer phẳng duy nhất.

Nếu các kiểu dữ liệu khác, bao gồm cả các kiểu đóng hộp như Integer và Float, được sử dụng, thì IllegalArgumentException sẽ được gửi.

Thông tin đầu vào

Mỗi dữ liệu đầu vào phải là một mảng hoặc mảng nhiều chiều của các loại dữ liệu gốc được hỗ trợ hoặc một ByteBuffer thô có kích thước thích hợp. Nếu giá trị nhập là một mảng hoặc mảng đa chiều, tensor đầu vào liên quan sẽ là được đổi kích thước hoàn toàn thành kích thước của mảng tại thời điểm suy luận. Nếu dữ liệu đầu vào là ByteBuffer, trước tiên, phương thức gọi phải đổi kích thước tensor đầu vào được liên kết theo cách thủ công (thông qua Interpreter.resizeInput()) trước khi chạy quy trình suy luận.

Khi sử dụng ByteBuffer, hãy ưu tiên sử dụng vùng đệm byte trực tiếp vì điều này cho phép Interpreter để tránh các bản sao không cần thiết. Nếu ByteBuffer là vùng đệm byte trực tiếp, thì thứ tự của vùng đệm đó phải là ByteOrder.nativeOrder(). Sau khi được dùng cho suy luận mô hình, nó phải không thay đổi cho đến khi suy luận mô hình kết thúc.

Đầu ra

Mỗi đầu ra phải là một mảng hoặc mảng nhiều chiều của các loại dữ liệu gốc được hỗ trợ hoặc một ByteBuffer có kích thước thích hợp. Xin lưu ý rằng một số mô hình có các đầu ra động, trong đó hình dạng của các tensor đầu ra có thể thay đổi tuỳ thuộc vào đầu vào. Không có cách nào đơn giản để xử lý vấn đề này bằng API suy luận Java hiện có, nhưng các tiện ích theo kế hoạch sẽ giúp bạn làm được điều này.

iOS (Swift)

Swift API có trong gói TensorFlowLiteSwift Pod của Cocoapods.

Trước tiên, bạn cần nhập mô-đun TensorFlowLite.

import TensorFlowLite

// Getting model path
guard
  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
  // Error handling...
}

do {
  // Initialize an interpreter with the model.
  let interpreter = try Interpreter(modelPath: modelPath)

  // Allocate memory for the model's input `Tensor`s.
  try interpreter.allocateTensors()

  let inputData: Data  // Should be initialized

  // input data preparation...

  // Copy the input data to the input `Tensor`.
  try self.interpreter.copy(inputData, toInputAt: 0)

  // Run inference by invoking the `Interpreter`.
  try self.interpreter.invoke()

  // Get the output `Tensor`
  let outputTensor = try self.interpreter.output(at: 0)

  // Copy output to `Data` to process the inference results.
  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
  let outputData =
        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
  outputTensor.data.copyBytes(to: outputData)

  if (error != nil) { /* Error handling... */ }
} catch error {
  // Error handling...
}

iOS (Objective-C)

Mục tiêu C API có trong gói LiteRTObjC Pod của Cocoapods.

Trước tiên, bạn cần nhập mô-đun TensorFlowLiteObjC.

@import TensorFlowLite;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];
NSError *error;

// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
                                                                  error:&error];
if (error != nil) { /* Error handling... */ }

// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }

NSMutableData *inputData;  // Should be initialized
// input data preparation...

// Get the input `TFLTensor`
TFLTensor *inputTensor = [interpreter inputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy the input data to the input `TFLTensor`.
[inputTensor copyData:inputData error:&error];
if (error != nil) { /* Error handling... */ }

// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }

// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&error];
if (error != nil) { /* Error handling... */ }

C API trong mã Target-C

API Objective-C không hỗ trợ các đối tượng uỷ quyền. Để sử dụng các đối tượng uỷ quyền có mã Objective-C, bạn cần gọi trực tiếp API C cơ bản.

#include "tensorflow/lite/c/c_api.h"

TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();

// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);

// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
    TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                           input.size() * sizeof(float));

// Execute inference.
TfLiteInterpreterInvoke(interpreter);

// Extract the output tensor data.
const TfLiteTensor* output_tensor =
    TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                         output.size() * sizeof(float));

// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);

C++

API C++ để chạy dự đoán với LiteRT tương thích với Android, iOS, và Linux. API C++ trên iOS chỉ có khi sử dụng bazel.

Trong C++, mô hình được lưu trữ trong lớp FlatBufferModel. Tệp này đóng gói một mô hình LiteRT và bạn có thể tạo mô hình đó theo một số cách, tuỳ thuộc vào vị trí lưu trữ mô hình:

class FlatBufferModel {
  // Build a model based on a file. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromFile(
      const char* filename,
      ErrorReporter* error_reporter);

  // Build a model based on a pre-loaded flatbuffer. The caller retains
  // ownership of the buffer and should keep it alive until the returned object
  // is destroyed. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
      const char* buffer,
      size_t buffer_size,
      ErrorReporter* error_reporter);
};

Giờ đây, khi đã có mô hình dưới dạng đối tượng FlatBufferModel, bạn có thể thực thi mô hình đó bằng Interpreter. Một FlatBufferModel có thể được nhiều người sử dụng đồng thời Interpreter.

Các phần quan trọng của API Interpreter được hiển thị trong đoạn mã dưới đây. Xin lưu ý rằng:

Tensor được biểu thị bằng các số nguyên để tránh so sánh chuỗi (và mọi phần phụ thuộc cố định trên thư viện chuỗi).
Không thể truy cập trình thông dịch từ các luồng đồng thời.
Bạn phải kích hoạt quá trình phân bổ bộ nhớ cho tensor đầu vào và đầu ra bằng cách gọi AllocateTensors() ngay sau khi đổi kích thước tensor.

Cách sử dụng LiteRT với C++ đơn giản nhất có dạng như sau:

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if needed.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();

float* output = interpreter->typed_output_tensor<float>(0);

Để xem thêm mã ví dụ, hãy xem minimal.cc và label_image.cc.

Python

API Python để chạy suy luận sử dụng Interpreter để tải mô hình và chạy suy luận.

Cài đặt gói LiteRT:

$ python3 -m pip install ai-edge-litert

Nhập Trình diễn giải LiteRT

from ai_edge_litert.interpreter import Interpreter
Interpreter = Interpreter(model_path=args.model.file)

Ví dụ sau đây cho biết cách sử dụng trình thông dịch Python để tải tệp FlatBuffers (.tflite) và chạy quy trình suy luận bằng dữ liệu đầu vào ngẫu nhiên:

Bạn nên sử dụng ví dụ này nếu đang chuyển đổi từ SavedModel bằng một SignatureDef đã xác định.

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()

  @tf.function(input_signature=[tf.TensorSpec(shape=[1, 10], dtype=tf.float32)])
  def add(self, x):
    '''
    Simple method that accepts single input 'x' and returns 'x' + 4.
    '''
    # Name the output 'result' for convenience.
    return {'result' : x + 4}

SAVED_MODEL_PATH = 'content/saved_models/test_variable'
TFLITE_FILE_PATH = 'content/test_variable.tflite'

# Save the model
module = TestModel()
# You can omit the signatures argument and a default signature name will be
# created with name 'serving_default'.
tf.saved_model.save(
    module, SAVED_MODEL_PATH,
    signatures={'my_signature':module.add.get_concrete_function()})

# Convert the model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()
with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])

Một ví dụ khác nếu mô hình chưa xác định SignatureDefs.

import numpy as np
import tensorflow as tf

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Thay vì tải mô hình dưới dạng tệp .tflite đã chuyển đổi trước, bạn có thể kết hợp mã của bạn với ứng dụng LiteRT Trình biên dịch , cho phép bạn chuyển đổi mô hình Keras sang định dạng LiteRT rồi chạy suy luận:

import numpy as np
import tensorflow as tf

img = tf.keras.Input(shape=(64, 64, 3), name="img")
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.identity(val, name="out")

# Convert to LiteRT format
converter = tf.lite.TFLiteConverter.from_keras_model(tf.keras.models.Model(inputs=[img], outputs=[out]))
tflite_model = converter.convert()

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

# Continue to get tensors and so forth, as shown above...

Để biết thêm về mã mẫu Python, hãy xem label_image.py.

Chạy dự đoán bằng mô hình hình dạng động

Nếu bạn muốn chạy một mô hình có hình dạng đầu vào động, hãy đổi kích thước hình dạng đầu vào trước khi chạy suy luận. Nếu không, hình dạng None trong các mô hình Tensorflow sẽ được thay thế bằng phần giữ chỗ 1 trong các mô hình LiteRT.

Các ví dụ sau đây cho thấy cách đổi kích thước hình dạng đầu vào trước khi chạy suy luận bằng nhiều ngôn ngữ. Tất cả các ví dụ đều giả định rằng hình dạng đầu vào được xác định là [1/None, 10] và cần đổi kích thước thành [3, 10].

Ví dụ về C++:

// Resize input tensors before allocate tensors
interpreter->ResizeInputTensor(/*tensor_index=*/0, std::vector<int>{3,10});
interpreter->AllocateTensors();

Ví dụ về Python:

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_path=TFLITE_FILE_PATH)

# Resize input shape for dynamic shape model and allocate tensor
interpreter.resize_tensor_input(interpreter.get_input_details()[0]['index'], [3, 10])
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()