Google AI Edge Portal のご紹介: エッジ AI を大規模にベンチマークします。限定公開プレビュー中にアクセスをリクエストするには、登録してください。

LiteRT CompiledModel Python API

LiteRT CompiledModel API は Python で使用できます。LiteRT ランタイムで TFLite モデルをコンパイルして実行するための高レベルのインターフェースを提供します。

次のガイドでは、CompiledModel Python API を使用した基本的な CPU 推論について説明します。

pip パッケージをインストールする

Python 環境に LiteRT pip パッケージをインストールします。

pip install ai-edge-litert

基本的な推論

`CompiledModel` を作成します

.tflite ファイルからコンパイル済みモデルを作成します。現在の Python ラッパーは、デフォルトで CPU 用にコンパイルされます。

from ai_edge_litert.compiled_model import CompiledModel

model = CompiledModel.from_file("mymodel.tflite")

インメモリバッファからコンパイル済みモデルを作成することもできます。

from ai_edge_litert.compiled_model import CompiledModel

with open("mymodel.tflite", "rb") as f:
  model = CompiledModel.from_buffer(f.read())

入力バッファと出力バッファを作成する

推論のためにモデルに渡す入力データと、モデルが推論の実行後に生成する出力データを保持するために必要なデータ構造（バッファ）を作成します。

signature_index = 0
input_buffers = model.create_input_buffers(signature_index)
output_buffers = model.create_output_buffers(signature_index)

0 の signature_index 値は、モデルの最初のシグネチャを選択します。

CPU メモリを使用している場合は、NumPy 配列を入力バッファに直接書き込んで入力を入力します。

import numpy as np

input_data = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffers[0].write(input_data)

モデルを呼び出す

入力バッファと出力バッファを指定して、モデルを実行します。

model.run_by_index(signature_index, input_buffers, output_buffers)

出力を取得する

メモリからモデル出力を直接読み取って出力を取得します。

import numpy as np

# Replace num_elements with the size of your model's output tensor.
num_elements = 4
output_array = output_buffers[0].read(num_elements, np.float32)

`TensorBuffer` を使用する

LiteRT は、TensorBuffer API を介して I/O バッファの相互運用性を組み込みでサポートしています。この API は、NumPy 配列の書き込み（write）と NumPy 配列の読み取り（read）をサポートしています。サポートされている dtype は np.float32、np.int32、np.int8 です。

既存のホストメモリでバッキングされたバッファを作成することもできます。

import numpy as np
from ai_edge_litert.tensor_buffer import TensorBuffer

input_array = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffer = TensorBuffer.create_from_host_memory(input_array)

シグネチャ名で実行するには、まずモデルシグネチャを検査し、入力/出力名から TensorBuffer インスタンスへのマップを指定します。

from ai_edge_litert.tensor_buffer import TensorBuffer

signatures = model.get_signature_list()
# Example signature structure:
# {"serving_default": {"inputs": ["input_0"], "outputs": ["output_0"]}}

input_buffer = TensorBuffer.create_from_host_memory(input_array)
output_buffer = model.create_output_buffer_by_name("serving_default", "output_0")

model.run_by_name(
  "serving_default",
  {"input_0": input_buffer},
  {"output_0": output_buffer},
)

TensorBuffer API の実装方法の詳細については、TensorBuffer のソースコードをご覧ください。

GPU アクセラレータを使用する

GPU がある場合は、CompiledModel 作成 API に HardwareAccelerator.GPU オプションを追加するだけで使用できます。

from ai_edge_litert.compiled_model import CompiledModel
from ai_edge_litert.compiled_model import HardwareAccelerator

model = CompiledModel.from_file("mymodel.tflite", HardwareAccelerator.GPU)

プラットフォームでサポートされているバックエンドについては、こちらをご覧ください。