LiteRT CompiledModel Python API

The LiteRT CompiledModel API is available in Python, offering a high-level interface for compiling and running TFLite models with the LiteRT runtime.

The following guide shows basic CPU inference with the CompiledModel Python API.

Install the pip package

Install the LiteRT pip package in your Python environment:

pip install ai-edge-litert

Basic inference

Create CompiledModel

Create a compiled model from a .tflite file. The current Python wrapper compiles for CPU by default.

from ai_edge_litert.compiled_model import CompiledModel

model = CompiledModel.from_file("mymodel.tflite")

You can also create a compiled model from an in-memory buffer:

from ai_edge_litert.compiled_model import CompiledModel

with open("mymodel.tflite", "rb") as f:
  model = CompiledModel.from_buffer(f.read())

Create Input and Output Buffers

Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.

signature_index = 0
input_buffers = model.create_input_buffers(signature_index)
output_buffers = model.create_output_buffers(signature_index)

The signature_index value of 0 selects the first signature in the model.

If you are using CPU memory, fill the inputs by writing NumPy arrays directly into the input buffers.

import numpy as np

input_data = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffers[0].write(input_data)

Invoke the model

Providing the input and output buffers, run the model.

model.run_by_index(signature_index, input_buffers, output_buffers)

Retrieve Outputs

Retrieve outputs by directly reading the model output from memory.

import numpy as np

# Replace num_elements with the size of your model's output tensor.
num_elements = 4
output_array = output_buffers[0].read(num_elements, np.float32)

Use TensorBuffer

LiteRT provides built-in support for I/O buffer interoperability through the TensorBuffer API, which supports writing NumPy arrays (write) and reading NumPy arrays (read). Supported dtypes are np.float32, np.int32, and np.int8.

You can also create a buffer backed by existing host memory:

import numpy as np
from ai_edge_litert.tensor_buffer import TensorBuffer

input_array = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffer = TensorBuffer.create_from_host_memory(input_array)

To run by signature name, first inspect the model signatures and then supply maps from input/output names to TensorBuffer instances:

from ai_edge_litert.tensor_buffer import TensorBuffer

signatures = model.get_signature_list()
# Example signature structure:
# {"serving_default": {"inputs": ["input_0"], "outputs": ["output_0"]}}

input_buffer = TensorBuffer.create_from_host_memory(input_array)
output_buffer = model.create_output_buffer_by_name("serving_default", "output_0")

model.run_by_name(
  "serving_default",
  {"input_0": input_buffer},
  {"output_0": output_buffer},
)

For a more complete view of how the TensorBuffer API is implemented, see the source code at TensorBuffer.