The LiteRT CompiledModel API is available in Python, offering a high-level
interface for compiling and running TFLite models with the LiteRT runtime.
The following guide shows basic CPU inference with the CompiledModel Python
API.
Install the pip package
Install the LiteRT pip package in your Python environment:
pip install ai-edge-litert
Basic inference
Create CompiledModel
Create a compiled model from a .tflite file. The current Python wrapper
compiles for CPU by default.
from ai_edge_litert.compiled_model import CompiledModel
model = CompiledModel.from_file("mymodel.tflite")
You can also create a compiled model from an in-memory buffer:
from ai_edge_litert.compiled_model import CompiledModel
with open("mymodel.tflite", "rb") as f:
model = CompiledModel.from_buffer(f.read())
Create Input and Output Buffers
Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.
signature_index = 0
input_buffers = model.create_input_buffers(signature_index)
output_buffers = model.create_output_buffers(signature_index)
The signature_index value of 0 selects the first signature in the model.
If you are using CPU memory, fill the inputs by writing NumPy arrays directly into the input buffers.
import numpy as np
input_data = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffers[0].write(input_data)
Invoke the model
Providing the input and output buffers, run the model.
model.run_by_index(signature_index, input_buffers, output_buffers)
Retrieve Outputs
Retrieve outputs by directly reading the model output from memory.
import numpy as np
# Replace num_elements with the size of your model's output tensor.
num_elements = 4
output_array = output_buffers[0].read(num_elements, np.float32)
Use TensorBuffer
LiteRT provides built-in support for I/O buffer interoperability through the
TensorBuffer API, which supports writing NumPy arrays (write) and reading
NumPy arrays (read). Supported dtypes are np.float32, np.int32, and
np.int8.
You can also create a buffer backed by existing host memory:
import numpy as np
from ai_edge_litert.tensor_buffer import TensorBuffer
input_array = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffer = TensorBuffer.create_from_host_memory(input_array)
To run by signature name, first inspect the model signatures and then supply
maps from input/output names to TensorBuffer instances:
from ai_edge_litert.tensor_buffer import TensorBuffer
signatures = model.get_signature_list()
# Example signature structure:
# {"serving_default": {"inputs": ["input_0"], "outputs": ["output_0"]}}
input_buffer = TensorBuffer.create_from_host_memory(input_array)
output_buffer = model.create_output_buffer_by_name("serving_default", "output_0")
model.run_by_name(
"serving_default",
{"input_0": input_buffer},
{"output_0": output_buffer},
)
For a more complete view of how the TensorBuffer API is implemented, see the source code at TensorBuffer.