Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

On-device Inference with LiteRT

LiteRT CompiledModel API represents the modern standard for on-device ML inference, offering streamlined hardware acceleration that significantly outperforms the Interpreter API. This interface simplifies the deployment of .tflite models across a wide range of edge platforms by providing a unified developer experiences and advanced features designed for maximum hardware efficiency.

Why Choose the `CompiledModel` API?

While the Interpreter API remains available for backward compatibility, the CompiledModel API is where new performance and accelerator features are prioritized. It is the recommended choice for these reasons:

Best-in-class GPU acceleration: Leverages ML Drift, the state-of-the-art GPU acceleration library, to deliver reliable GPU inference across mobile, web, desktop, and IoT devices. See GPU acceleration with LiteRT.
Unified NPU access: Provides a single, consistent developer experience to access NPUs from various providers like Google Tensor, Qualcomm, MediaTek, abstracting away vendor-specific compilers and runtime complexities. See NPU acceleration with LiteRT.
Automated hardware selection: Automatically selects the optimal backend among CPU, GPU, and NPU, based on available hardware and internal priority logic, eliminating the need for manual delegate configuration.
Asynchronous execution: Utilizes OS-level mechanisms (like sync fences) to allow hardware accelerators to trigger directly upon completion of previous tasks without involving the CPU. This can reduce latency by up to 2x and ensures a smoother, more interactive AI experience.
Efficient I/O buffer management: Leverages the TensorBuffer API to manage high-performance data flow between accelerators. This includes zero-copy buffer interop across AHardwareBuffer, OpenCL, and OpenGL, eliminating costly data copies between preprocessing, inference, and post-processing stages.

Get Started with `CompiledModel` API

For classical ML models, see the following demo apps.
- Image segmentation Kotlin App: CPU/GPU/NPU inference.
- Image segmentation C++ App: CPU/GPU/NPU inference with async execution.
For GenAI models, see the following demo apps:
- EmbeddingGemma semantic similarity C++ App: CPU/GPU/NPU inference.

Supported platforms

LiteRT CompiledModel API supports high-performance inferences across Android, iOS, Web, IoT, and Desktop devices. See platform-specific guide.

On-device Inference with LiteRT

Why Choose the CompiledModel API?

Get Started with CompiledModel API

Supported platforms

Why Choose the `CompiledModel` API?

Get Started with `CompiledModel` API