Introducing LiteRT: Google's high-performance runtime for on-device AI, formerly known as TensorFlow Lite. Learn more

LiteRT overview

LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. You can find ready-to-run LiteRT models for a wide range of ML/AI tasks, or convert and run TensorFlow, PyTorch, and JAX models to the TFLite format using the AI Edge conversion and optimization tools.

Key features

Optimized for on-device machine learning: LiteRT addresses five key ODML constraints: latency (there's no round-trip to a server), privacy (no personal data leaves the device), connectivity (internet connectivity is not required), size (reduced model and binary size) and power consumption (efficient inference and a lack of network connections).
Multi-platform support: Compatible with Android and iOS devices, embedded Linux, and microcontrollers.
Multi-framework model options: AI Edge provides tools to convert models from TensorFlow, PyTorch, and JAX models into the FlatBuffers format (.tflite), enabling you to use a wide range of state-of-the-art models on LiteRT. You also have access to model optimization tools that can handle quantization and metadata.
Diverse language support: Includes SDKs for Java/Kotlin, Swift, Objective-C, C++, and Python.
High performance: Hardware acceleration through specialized delegates like GPU and iOS Core ML.

Development workflow

The LiteRT development workflow involves identifying an ML/AI problem, choosing a model that solves that problem, and implementing the model on-device. The following steps walk you through the workflow and provides links to further instructions.

1. Identify the most suitable solution to the ML problem

LiteRT offers users a high level of flexibility and customizability when it comes to solving machine learning problems, making it a good fit for users who require a specific model or a specialized implementation. Users looking for plug-and-play solutions may prefer MediaPipe Tasks, which provides ready-made solutions for common machine learning tasks like object detection, text classification, and LLM inference.

Choose one of the following AI Edge frameworks:

LiteRT: Flexible and customizable runtime that can run a wide range of models. Choose a model for your use case, convert it to the LiteRT format (if necessary), and run it on-device. If you intend to use LiteRT, keep reading.
MediaPipe Tasks: Plug-and-play solutions with default models that allow for customization. Choose the task that solves your AI/ML problem, and implement it on multiple platforms. If you intend to use MediaPipe Tasks, refer to the MediaPipe Tasks documentation.

2. Choose a model

A LiteRT model is represented in an efficient portable format known as FlatBuffers, which uses the .tflite file extension.

You can use a LiteRT model in the following ways:

Use an existing LiteRT model: The simplest approach is to use a LiteRT model already in the .tflite format. These models do not require any added conversion steps. You can find LiteRT models on Kaggle Models.
Convert a model into a LiteRT model: You can use the TensorFlow Converter, PyTorch Converter, or JAX converter to convert models to the FlatBuffers format (.tflite) and run them in LiteRT. To get started, you can find models on the following sites:
- TensorFlow models on Kaggle Models and Hugging Face
- PyTorch models on Hugging Face and torchvision
- JAX models on Hugging Face

A LiteRT model can optionally include metadata that contains human-readable model descriptions and machine-readable data for automatic generation of pre- and post-processing pipelines during on-device inference. Refer to Add metadata for more details.

3. Integrate the model into your app

You can implement your LiteRT models to run inferences completely on-device on web, embedded, and mobile devices. LiteRT contains APIs for Python, Java and Kotlin for Android, Swift for iOS, and C++ for micro-devices.

Use the following guides to implement a LiteRT model on your preferred platform:

Run on Android: Run models on Android devices using the Java/Kotlin APIs.
Run on iOS: Run models on iOS devices using the Swift APIs.
Run on Micro: Run models on embedded devices using the C++ APIs.

On Android and iOS devices, you can improve performance using hardware acceleration. On either platform you can use a GPU Delegate, and on iOS you can use the Core ML Delegate. To add support for new hardware accelerators, you can define your own delegate.

You can run inference in the following ways based on the model type:

Models without metadata: Use the LiteRT Interpreter API. Supported on multiple platforms and languages such as Java, Swift, C++, Objective-C and Python.
Models with metadata: You can build custom inference pipelines with the LiteRT Support Library.

Migrate from TF Lite

Applications that use TF Lite libraries will continue to function, but all new active development and updates will only be included in LiteRT packages. The LiteRT APIs contain the same method names as the TF Lite APIs, so migrating to LiteRT does not require detailed code changes.

For more information, refer to the migration guide.

Next steps

New users should get started with the LiteRT quickstart. For specific information, see the following sections:

Model conversion

Platform guides