Hardware acceleration is the use of specialized computer hardware to improve the execution speed and efficiency of an AI model. For LiteRT, this primarily involves using Graphics Processing Units (GPU) or Neural Processing Units (NPU) for AI inference, as well as general-purpose Central Processing Units (CPUs) vector instructions.
The LiteRT Compiled Model API handles acceleration in two steps:
- Compilation: prepare a model to run with a specific hardware.
- Dispatch: run selected operations on the relevant hardware.
The compilation phase modifies a LiteRT model with a new interface that offers more flexibility though compiler plugins. Model compilation occurs ahead of time (AOT), before the graph is executed, and tailors a specific graph to run on the device.
Types of accelerators
LiteRT provides three types of accelerators: NPU, GPU and CPU.
- NPU: unified interface for Qualcomm AI Engine Direct and MediaTek NeuroPilot today.
- GPU: OpenCL backend plus WebGPU and Metal backends using the Compiled Model API.
- CPU: XNNPACK-optimized execution is the default and always available.
These accelerators may be combined to get the best performance possible when some complex operations are not available on a given hardware. When accelerators compete over an operation, LiteRT uses the following order of precedence: NPU, GPU, CPU.
GPU acceleration
With LiteRT GPU acceleration, you can create GPU-friendly input and output buffers, achieve zero-copy with your data in GPU memory, and execute tasks asynchronously to maximize parallelism. No delegate plumbing is required—just specify the accelerator at compile time:
// Create a compiled model targeting GPU
LITERT_ASSIGN_OR_RETURN(auto compiled_model,
CompiledModel::Create(env, model, kLiteRtHwAcceleratorGpu));
For more information on GPU acceleration, see GPU acceleration with LiteRT
NPU acceleration
LiteRT provides a unified interface to harness NPUs without forcing you to individually navigate vendor-specific compilers, runtimes, or library dependencies. It supports Qualcomm and MediaTek NPUs for both AOT and On-Device compilation paths.
Using NPUs with LiteRT typically involves converting and compiling a model with Play for On-device AI (PODAI) and deploying the model with Play AI Pack and Feature Module.