tf.lite.Interpreter

Interpreter interface for running TensorFlow Lite models.

Used in the notebooks

Used in the guide Used in the tutorials

Models obtained from TfLiteConverter can be run in Python with Interpreter.

As an example, let's generate a simple Keras model and convert it to TFLite (TfLiteConverter also supports other input formats with from_saved_model and from_concrete_function)

x = np.array([[1.], [2.]])
y = np.array([[2.], [4.]])
model = tf.keras.models.Sequential([
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(units=1, input_shape=[1])
        ])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(x, y, epochs=1)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

tflite_model can be saved to a file and loaded later, or directly into the Interpreter. Since TensorFlow Lite pre-plans tensor allocations to optimize inference, the user needs to call allocate_tensors() before any inference.

interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()  # Needed before execution!

Sample execution:

output = interpreter.get_output_details()[0]  # Model has single output.
input = interpreter.get_input_details()[0]  # Model has single input.
input_data = tf.constant(1., shape=[1, 1])
interpreter.set_tensor(input['index'], input_data)
interpreter.invoke()
interpreter.get_tensor(output['index']).shape
(1, 1)

Use get_signature_runner() for a more user-friendly inference API.

model_path Path to TF-Lite Flatbuffer file.
model_content Content of model.
experimental_delegates Experimental. Subject to change. List of TfLiteDelegate objects returned by lite.load_delegate().
num_threads Sets the number of threads used by the interpreter and available to CPU kernels. If not set, the interpreter will use an implementation-dependent default number of threads. Currently, only a subset of kernels, such as conv, support multi-threading. num_threads should be >= -1. Setting num_threads to 0 has the effect to disable multithreading, which is equivalent to setting num_threads to 1. If set to the value -1, the number of threads used will be implementation-defined and platform-dependent.
experimental_op_resolver_type The op resolver used by the interpreter. It must be an instance of OpResolverType. By default, we use the built-in op resolver which corresponds to tflite::ops::builtin::BuiltinOpResolver in C++.
experimental_preserve_all_tensors If true, then intermediate tensors used during computation are preserved for inspection, and if the passed op resolver type is AUTO or BUILTIN, the type will be changed to BUILTIN_WITHOUT_DEFAULT_DELEGATES so that no Tensorflow Lite default delegates are applied. If false, getting intermediate tensors could result in undefined values or None, especially when the graph is successfully modified by the Tensorflow Lite default delegate.
experimental_disable_delegate_clustering If true, don't perform delegate clustering during delegate graph partitioning phase. Disabling delegate clustering will make the execution order of ops respect the explicitly-inserted control dependencies in the graph (inserted via with tf.control_dependencies()) since the TF Lite converter will drop control dependencies by default. Most users shouldn't turn this flag to True if they don't insert explicit control dependencies or the graph execution order is expected. For automatically inserted control dependencies (with tf.Variable, tf.Print etc), the user doesn't need to turn this flag to True since they are respected by default. Note that this flag is currently experimental, and it might be removed/updated if the TF Lite converter doesn't drop such control dependencies in the model. Default is False.
experimental_default_delegate_latest_features If true, default delegates may enable all flag protected features. Default is False;

ValueError If the interpreter was unable to create.

Methods

allocate_tensors

View source

get_input_details

View source

Gets model input tensor details.

Returns
A list in which each item is a dictionary with details about an input tensor. Each dictionary contains the following fields that describe the tensor:

  • name: The tensor name.
  • index: The tensor index in the interpreter.
  • shape: The shape of the tensor.
  • shape_signature: Same as shape for models with known/fixed shapes. If any dimension sizes are unknown, they are indicated with -1.

  • dtype: The numpy data type (such as np.int32 or np.uint8).

  • quantization: Deprecated, use quantization_parameters. This field only works for per-tensor quantization, whereas quantization_parameters works in all cases.

  • quantization_parameters: A dictionary of parameters used to quantize the tensor: ~ scales: List of scales (one if per-tensor quantization). ~ zero_points: List of zero_points (one if per-tensor quantization). ~ quantized_dimension: Specifies the dimension of per-axis quantization, in the case of multiple scales/zero_points.

  • sparsity_parameters: A dictionary of parameters used to encode a sparse tensor. This is empty if the tensor is dense.

get_output_details

View source

Gets model output tensor details.

Returns
A list in which each item is a dictionary with details about an output tensor. The dictionary contains the same fields as described for get_input_details().

get_signature_list

View source

Gets the list of SignatureDefs in the model.

Example,

signatures = interpreter.get_signature_list()
print(signatures)

# {
#   'add': {'inputs': ['x', 'y'], 'outputs': ['output_0']}
# }

Then using the names in the signature list you can get a callable from
get_signature_runner()