TensorFlow Lite

TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and edge devices.

Key features

  • Optimized for on-device machine learning, by addressing 5 key constraints: latency (there's no round-trip to a server), privacy (no personal data leaves the device), connectivity (internet connectivity is not required), size (reduced model and binary size) and power consumption (efficient inference and a lack of network connections).
  • Multiple platform support, covering Android and iOS devices, embedded Linux, and microcontrollers.
  • Diverse language support, which includes Java, Swift, Objective-C, C++, and Python.
  • High performance, with hardware acceleration and model optimization.

Development workflow

The following guide walks through each step of the workflow and provides links to further instructions:

1. Generate a TensorFlow Lite model

A TensorFlow Lite model is represented in a special efficient portable format known as FlatBuffers (identified by the .tflite file extension). This provides several advantages over TensorFlow's protocol buffer model format such as reduced size (small code footprint) and faster inference (data is directly accessed without an extra parsing/unpacking step) that enables TensorFlow Lite to execute efficiently on devices with limited compute and memory resources.

A TensorFlow Lite model can optionally include metadata that has human-readable model description and machine-readable data for automatic generation of pre- and post-processing pipelines during on-device inference. Refer to Add metadata for more details.

You can generate a TensorFlow Lite model in the following ways:

  • Use an existing TensorFlow Lite model: Refer to TensorFlow Lite Examples to pick an existing model. Models may or may not contain metadata.

  • Convert a TensorFlow model into a TensorFlow Lite model: Use the TensorFlow Lite Converter to convert a TensorFlow model into a TensorFlow Lite model. During conversion, you can apply optimizations such as quantization to reduce model size and latency with minimal or no loss in accuracy. By default, all models don't contain metadata.

2. Run Inference

Inference refers to the process of executing a TensorFlow Lite model on-device to make predictions based on input data. You can run inference in the following ways based on the model type:

On Android and iOS devices, you can improve performance using hardware acceleration. On either platform you can use a GPU Delegate, and on iOS you can use the Core ML Delegate. To add support for new hardware accelerators, you can define your own delegate.

Get started

You can refer to the following guides based on your target device:

Technical constraints