LiteRT のご紹介: Google のオンデバイス AI 用の高性能ランタイム（旧称 TensorFlow Lite）です。

このページは Cloud Translation API によって翻訳されました。

LiteRT を使ってみる

このガイドでは、LiteRT（Lite の略）を実行するプロセスについて説明します。ランタイム）モデルを作成し、入力データに基づいて予測を行います。これは、 LiteRT インタープリタで実現できます。このインタープリタでは、静的なグラフの順序付けとカスタム（動的でない）メモリアロケータを使用して、負荷、初期化、パフォーマンスが向上します

LiteRT 推論は通常、次の手順で行います。

モデルの読み込み: .tflite モデルをメモリに読み込みます。このモデルには、モデルの実行グラフが含まれています。
データの変換: 入力データを所定の形式に変換し、定義できます。通常、モデルの未加工の入力データが入力と一致しないデータ形式を標準化できます。たとえば、画像のサイズをそのモデルに適合するように画像形式を変更します。
推論の実行: LiteRT モデルを実行して予測を行います。この LiteRT API を使用してモデルを実行します。いくつかの作業がインタプリタの構築やテンソルの割り当てなどのステップを実行するだけです
出力の解釈: 出力テンソルを意味のある方法で解釈します。役立ちますたとえば、モデルから確率のリストのみが返される場合があります。確率を関連する値にマッピングするのは任意です。出力の形式を設定します。

このガイドでは、LiteRT インタープリタにアクセスし、 C++、Java、Python を使用して推論を行います。

対応プラットフォーム

TensorFlow 推論 API は、Android、iOS、Linux などの一般的なモバイルプラットフォームと組み込みプラットフォームで、複数のプログラミング言語で提供されています。

ほとんどの場合、API 設計は使いやすさよりもパフォーマンスを優先します。あります。LiteRT は小型デバイスで高速な推論を行えるよう設計されているため、不要なコピーを作成して、利便性を犠牲にすることができます。

すべてのライブラリで、LiteRT API を使用してモデルを読み込み、入力をフィードし、推論出力を取得できます。

Android プラットフォーム

Android では、LiteRT 推論は Java API または C++ API を使用して実行できます。「 Java API は便利で、Android Studio 内で直接使用できます。アクティビティクラス。C++ API はより柔軟で高速ですが、Java レイヤと C++ レイヤ間でデータを移動するために JNI ラッパーの作成が必要になる場合があります。

詳細については、C++ と Java のセクションをご覧ください。または、Android クイックスタートに沿って操作してください。

iOS プラットフォーム

iOS では、LiteRT は Swift と Objective-C の iOS ライブラリで使用できます。Objective-C コードで C API を直接使用することもできます。

Swift、Objective-C、C API を確認する iOS クイックスタートに沿って操作してください。

Linux プラットフォーム

Linux プラットフォームでは、 C++.

モデルを読み込んで実行する

LiteRT モデルを読み込んで実行する手順は次のとおりです。

モデルをメモリに読み込む。
既存のモデルに基づいて Interpreter を構築する。
入力テンソル値の設定。
推論の呼び出し。
テンソル値の出力。

Android（Java）

LiteRT で推論を実行するための Java API は、主に Android での使用を目的としており、Android ライブラリの依存関係（com.google.ai.edge.litert）として使用できます。

Java では、Interpreter クラスを使用してモデルを読み込み、モデル推論を実行します。多くの場合、この API のみが必要になります。

FlatBuffers（.tflite）ファイルを使用して Interpreter を初期化できます。

public Interpreter(@NotNull File modelFile);

または MappedByteBuffer を使用します。

public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);

どちらの場合も、有効な LiteRT モデルを指定する必要があります。指定しない場合、API がスローします。 IllegalArgumentException。MappedByteBuffer を使用して Interpreter は、存続期間中、変更されないままにする必要があります。 Interpreter。

モデルで推論を実行する場合は、シグネチャを使用することをおすすめします。Tensorflow 2.5 以降に変換されたモデルで使用できます。

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("input_1", input1);
  inputs.put("input_2", input2);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("output_1", output1);
  interpreter.runSignature(inputs, outputs, "mySignature");
}

runSignature メソッドは、次の 3 つの引数を取ります。

Inputs : シグネチャ内の入力名から入力にマッピングする渡されます。
出力: シグネチャの出力名から出力データへの出力マッピングのマップ。
Signature Name（省略可）: 署名の名前（署名は 1 つです）。

モデルに定義されたシグネチャがない場合に推論を実行する別の方法。Interpreter.run() を呼び出すだけです。例:

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

run() メソッドは 1 つの入力のみを受け取り、1 つの出力のみ返します。モデルに複数の入力または複数の出力がある場合は、代わりに次を使用します。

interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);

この場合、inputs の各エントリは入力テンソルに対応し、 map_of_indices_to_outputs は、出力テンソルのインデックスを対応する出力データです。

どちらの場合も、テンソルのインデックスは、モデルの作成時に LiteRT コンバータをインストールしておく必要があります。input のテンサーの順序は、LiteRT コンバータに指定された順序と一致する必要があります。

Interpreter クラスには、オペレーション名を使用した任意のモデル入力または出力のインデックス:

public int getInputIndex(String opName);
public int getOutputIndex(String opName);

opName がモデル内の有効な演算でない場合、 IllegalArgumentException。

また、Interpreter がリソースを所有していることにも注意してください。メモリリークを回避するには、使用後にリソースを解放する必要があります。

interpreter.close();

Java を使用したプロジェクトの例については、Android オブジェクト検出サンプルアプリをご覧ください。

サポートされるデータタイプ

LiteRT を使用するには、入力テンソルと出力テンソルのデータ型が次のプリミティブ型:

float
int
long
byte

String 型もサポートされていますが、プリミティブ型とは異なる方法でエンコードされます。特に、文字列テンソルの形状は、テンソル内の文字列の数と配置を決定します。各要素自体は可変長の文字列です。この意味で、テンソルの（バイト）サイズは形と型のみから計算されるため、文字列を変換できません。単一のフラットな ByteBuffer 引数として指定します。

Integer や Float などのボックス型を含む他のデータ型が使用されている場合、IllegalArgumentException がスローされます。

入力

各入力は、サポートされている配列または多次元配列にするプリミティブ型、または適切なサイズの未加工の ByteBuffer です。入力が入力テンソルが 1 つの配列または多次元配列である場合、推論時に配列のディメンションに暗黙的にサイズ変更されます。入力が ByteBuffer の場合、呼び出し元は、推論を実行する前に、まず関連する入力テンソルのサイズを手動で変更する必要があります（Interpreter.resizeInput() を介して）。

ByteBuffer を使用する場合は、直接バイトバッファを使用することをおすすめします。これにより、不要なコピーを回避する Interpreter。ByteBuffer がダイレクトバイトの場合順序は ByteOrder.nativeOrder() にする必要があります。モデル推論に使用された後、モデル推論が完了するまで変更されないようにする必要があります。

出力

各出力は、サポートされているプリミティブ型の配列または多次元配列、または適切なサイズの ByteBuffer である必要があります。注意すべき点は動的な出力があり、出力テンソルの形状はモデルによってあります。既存の Java Inference API ですが、計画された拡張機能によってこれが可能になります。

iOS - Swift

Swift API Cocoapods の TensorFlowLiteSwift Pod で使用できます。

まず、TensorFlowLite モジュールをインポートする必要があります。

import TensorFlowLite

// Getting model path
guard
  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
  // Error handling...
}

do {
  // Initialize an interpreter with the model.
  let interpreter = try Interpreter(modelPath: modelPath)

  // Allocate memory for the model's input `Tensor`s.
  try interpreter.allocateTensors()

  let inputData: Data  // Should be initialized

  // input data preparation...

  // Copy the input data to the input `Tensor`.
  try self.interpreter.copy(inputData, toInputAt: 0)

  // Run inference by invoking the `Interpreter`.
  try self.interpreter.invoke()

  // Get the output `Tensor`
  let outputTensor = try self.interpreter.output(at: 0)

  // Copy output to `Data` to process the inference results.
  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
  let outputData =
        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
  outputTensor.data.copyBytes(to: outputData)

  if (error != nil) { /* Error handling... */ }
} catch error {
  // Error handling...
}

iOS（Objective-C）

Objective-C API Cocoapods の LiteRTObjC Pod で使用できます。

まず、TensorFlowLiteObjC モジュールをインポートする必要があります。

@import TensorFlowLite;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];
NSError *error;

// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
                                                                  error:&error];
if (error != nil) { /* Error handling... */ }

// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }

NSMutableData *inputData;  // Should be initialized
// input data preparation...

// Get the input `TFLTensor`
TFLTensor *inputTensor = [interpreter inputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy the input data to the input `TFLTensor`.
[inputTensor copyData:inputData error:&error];
if (error != nil) { /* Error handling... */ }

// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }

// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&error];
if (error != nil) { /* Error handling... */ }

Objective-C コードの C API

Objective-C API はデリゲートをサポートしていません。代理人を使用するには、 Objective-C コードを使用する場合、基盤となる C API。

#include "tensorflow/lite/c/c_api.h"

TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();

// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);

// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
    TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                           input.size() * sizeof(float));

// Execute inference.
TfLiteInterpreterInvoke(interpreter);

// Extract the output tensor data.
const TfLiteTensor* output_tensor =
    TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                         output.size() * sizeof(float));

// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);

C++

LiteRT で推論を実行するための C++ API は、Android、iOS、Linux プラットフォームに対応しています。iOS の C++ API は、bazel を使用している場合にのみ使用できます。

C++ では、モデルは FlatBufferModel クラスに保存されます。LiteRT モデルをカプセル化します。モデルの保存場所に応じて、いくつかの方法でビルドできます。

class FlatBufferModel {
  // Build a model based on a file. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromFile(
      const char* filename,
      ErrorReporter* error_reporter);

  // Build a model based on a pre-loaded flatbuffer. The caller retains
  // ownership of the buffer and should keep it alive until the returned object
  // is destroyed. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
      const char* buffer,
      size_t buffer_size,
      ErrorReporter* error_reporter);
};

モデルが FlatBufferModel オブジェクトになったので、Interpreter を使用して実行できます。1 つの FlatBufferModel を複数のユーザーが同時に使用できる Interpreter。

Interpreter API の重要な部分を次のコードスニペットに示します。次の点に注意してください。

文字列の比較を避けるため、テンソルは整数で表現される（および文字列ライブラリに対する固定の依存関係）を使用します。
インタープリタには、同時実行スレッドからアクセスしないでください。
入力テンソルと出力テンソルのメモリ割り当ては、テンソルのサイズ変更直後に AllocateTensors() を呼び出してトリガーする必要があります。

C++ で LiteRT を最も簡単に使用すると、次のようになります。

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if needed.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();

float* output = interpreter->typed_output_tensor<float>(0);

その他のサンプルコードについては、minimal.cc と label_image.cc をご覧ください。

Python

推論を実行するための Python API は、 Interpreter: モデルを読み込む推論を実行できます

LiteRT パッケージをインストールします。

$ python3 -m pip install ai-edge-litert

LiteRT インタープリタをインポートする

from ai_edge_litert.interpreter import Interpreter
Interpreter = Interpreter(model_path=args.model.file)

次の例は、Python インタープリタを使用して FlatBuffers（.tflite）ファイルを読み込み、ランダムな入力データで推論を実行する方法を示しています。

この例は、定義した値を使用して SavedModel から変換する場合に推奨されます。 SignatureDef。

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()

  @tf.function(input_signature=[tf.TensorSpec(shape=[1, 10], dtype=tf.float32)])
  def add(self, x):
    '''
    Simple method that accepts single input 'x' and returns 'x' + 4.
    '''
    # Name the output 'result' for convenience.
    return {'result' : x + 4}

SAVED_MODEL_PATH = 'content/saved_models/test_variable'
TFLITE_FILE_PATH = 'content/test_variable.tflite'

# Save the model
module = TestModel()
# You can omit the signatures argument and a default signature name will be
# created with name 'serving_default'.
tf.saved_model.save(
    module, SAVED_MODEL_PATH,
    signatures={'my_signature':module.add.get_concrete_function()})

# Convert the model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()
with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])

モデルに SignatureDefs が定義されていない場合の別の例。

import numpy as np
import tensorflow as tf

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

事前変換された .tflite ファイルとしてモデルを読み込む代わりに、コードを LiteRT コンパイラと組み合わせて、Keras モデルを LiteRT 形式に変換してから推論を実行できます。

import numpy as np
import tensorflow as tf

img = tf.keras.Input(shape=(64, 64, 3), name="img")
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.identity(val, name="out")

# Convert to LiteRT format
converter = tf.lite.TFLiteConverter.from_keras_model(tf.keras.models.Model(inputs=[img], outputs=[out]))
tflite_model = converter.convert()

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

# Continue to get tensors and so forth, as shown above...

その他の Python サンプルコードについては、 label_image.py。

動的形状モデルを使用して推論を実行する

動的入力シェイプでモデルを実行する場合は、推論を実行する前に入力シェイプのサイズを変更します。そうしないと、TensorFlow モデルの None 形状が LiteRT モデルでは 1 のプレースホルダに置き換えられます。

次の例は、実行前に入力シェイプのサイズを変更する方法を示しています。さまざまな言語での推論を行えます。どの例でも、入力の形状が [1/None, 10] として定義されているため、[3, 10] にサイズ変更する必要があります。

C++ の例

// Resize input tensors before allocate tensors
interpreter->ResizeInputTensor(/*tensor_index=*/0, std::vector<int>{3,10});
interpreter->AllocateTensors();

Python の例:

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_path=TFLITE_FILE_PATH)

# Resize input shape for dynamic shape model and allocate tensor
interpreter.resize_tensor_input(interpreter.get_input_details()[0]['index'], [3, 10])
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()