Neural Processing Units (NPUs) offer specialized hardware blocks optimized for deep learning workloads. They are increasingly available in modern systems on a chip (SoCs), especially on mobile devices. Their high-performing nature makes them a great fit for running LLM inference.
LiteRT-LM is a C++ library designed to efficiently run language model pipelines on a wide range of devices, from mobile phones to embedded systems. It provides developers with the tools to create and deploy sophisticated language model workflows, now with seamless NPU integration.
NPU Vendors
LiteRT-LM supports running LLMs using NPU acceleration with the following vendors. Choose the instructions depending on which vendor you would like to try:
Quick Start
- Follow the Prerequisites to set up the environment and repository.
- Ensure
adbis installed and a device is connected. - See the Quick Start and the
litert_lm_maincommand line demo.
Qualcomm AI Engine Direct
Step 1: Download the
.litertlm model
Download a .litertlm matching your SoC (examples below). You can query your
device SoC:
SOC_MODEL=$(adb shell getprop ro.soc.model | tr '[:upper:]' '[:lower:]')
echo "https://huggingface.co/litert-community/Gemma3-1B-IT/blob/main/Gemma3-1B-IT_q4_ekv1280_${SOC_MODEL}.litertlm"
| Model | SoC | Quantization | Context size | Model Size (MB) | Download |
|---|---|---|---|---|---|
| Gemma3-1B | SM8750 | 4-bit per-channel | 1280 | 658 | download |
| Gemma3-1B | SM8650 | 4-bit per-channel | 1280 | 658 | download |
| Gemma3-1B | SM8550 | 4-bit per-channel | 1280 | 657 | download |
Step 2: Download and extract the QAIRT libraries
Download the QAIRT SDK, extract it, and set QAIRT_ROOT:
unzip <your_file.zip> -d ~/
QAIRT_ROOT=~/qairt/2.34.0.250424
Step 3: Build the LiteRT-LM runtime / libraries
Install Android NDK r28b+ and build:
bazel build --config=android_arm64 //runtime/engine:litert_lm_main
bazel build --config=android_arm64 \
@litert//litert/vendors/qualcomm/dispatch:dispatch_api_so
Step 4: Run the model on device
Set your device path and push assets:
export DEVICE_FOLDER=/data/local/tmp/
adb shell mkdir -p $DEVICE_FOLDER
export MODEL_PATH=<path-to-model.litertlm>
adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm
adb push $QAIRT_ROOT/lib/aarch64-android/* $DEVICE_FOLDER/
adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER/
adb shell chmod +x $DEVICE_FOLDER/litert_lm_main
Run:
adb shell "cd $DEVICE_FOLDER && \
QAIRT_ROOT=$DEVICE_FOLDER \
./litert_lm_main --model=model.litertlm \
--prompt='Explain the history of LiteRT in 3 bullet points' \
--device=qualcomm_npu --rounds=1"
MediaTek NeuroPilot
Steps overview
MediaTek flow mirrors Qualcomm: use a .litertlm built for your SoC, include
NeuroPilot runtime libraries, build litert_lm_main, push assets, and run with
--device=mediatek_npu.