Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

Run LLMs using LiteRT-LM

LiteRT-LM is a cross-platform library designed to efficiently run language model pipelines on a wide range of devices, from mobile phones to embedded systems. It provides developers with the tools to create and deploy sophisticated language model workflows, now with seamless NPU integration.

Run LLMs on CPU and GPU

See LiteRT-LM GitHub repo for detailed instructions on cross-platform development and CPU/GPU hardware acceleration.

Run LLMs on NPU

Neural Processing Units (NPUs) offer specialized hardware blocks optimized for deep learning workloads. They are increasingly available in modern systems on a chip (SoCs), especially on mobile devices. Their high-performing nature makes them a great fit for running LLM inference.

NPU Vendors

LiteRT-LM supports running LLMs using NPU acceleration with the following vendors. Choose the instructions depending on which vendor you would like to try:

Qualcomm AI Engine Direct
MediaTek NeuroPilot

Quick Start

To get started, first follow the Prerequisites instructions to set up the environment and the repository.

Also, to be able to interact with your Android device, make sure you've properly installed Android Debug Bridge and have a connected device that can be accessed using adb.

For more details instructions, checkout the Quick Start section in the LiteRT-LM repository and find more information about the litert_lm_main command line demo.

Qualcomm AI Engine Direct

The steps to run the LLMs on Qualcomm NPU are as the following:

Step 1: Download the .litertlm model

The LiteRT-LM runtime is designed to work with models in the .litertlm format. You can find and download compatible models in the following table.

Model	SoC	Quantization	Context size	Model Size (Mb)	Download link
Gemma3-1B	SM8750	4-bit per-channel	1280	658	download
Gemma3-1B	SM8650	4-bit per-channel	1280	658	download
Gemma3-1B	SM8550	4-bit per-channel	1280	657	download

You'll have to download the model that corresponds to your SoC. Following is an example command that can help you get the Hugging Face link that can download the correct model for the SoC of your phone (note that you'll have to login and acknowledge the form in order to have the permission to download the file). You should make sure there is a connected device that can be accessed using adb.

SOC_MODEL=$(adb shell getprop ro.soc.model | tr '[:upper:]' '[:lower:]')
echo "https://huggingface.co/litert-community/Gemma3-1B-IT/blob/main/Gemma3-1B-IT_q4_ekv1280_${SOC_MODEL}.litertlm"

Verify that the $SOC_MODEL is listed in the support table. The generated link won't function for unsupported models. New support for NPUs is being added regularly so check back again later to see if your device is supported.

Step 2: Download and extract the QAIRT libraries

In order to run the model using the Qualcomm NPU, it requires associated runtime libraries on the device. One can download the QAIRT SDK from the link and extract the file. Set the variable QAIRT_ROOT (will use it in later steps) to point to the unzipped folder that contains the version number, for example:

unzip <your_file.zip> -d ~/

QAIRT_ROOT=~/qairt/2.42.0.251225

Step 3: Build the LiteRT-LM runtime / libraries

Develop in Linux

To be able to build the binary for Android, one needs to install NDK r28b or newer from https://developer.android.com/ndk/downloads#stable-downloads. Specific steps are:

Download the zip file from https://developer.android.com/ndk/downloads#stable-downloads.
Extract the zip file to your preferred location (say /path/to/AndroidNDK/)
Make ANDROID_NDK_HOME to point to the NDK directory. It should be something like:

export ANDROID_NDK_HOME=/path/to/AndroidNDK/

Tips: make sure your ANDROID_NDK_HOME points to the directory that has README.md in it.

With the set up, try to build the litert_lm_main binary:

bazel build --config=android_arm64 //runtime/engine:litert_lm_main

Additionally, we should build the dispatch API library in order for the LiteRT-LM runtime to interact with NPU:

bazel build --config=android_arm64 \
    @litert//litert/vendors/qualcomm/dispatch:dispatch_api_so

Develop in macOS

Xcode command line tools include clang. Run xcode-select --install if not installed before.

To be able to build the binary for Android, one needs to install NDK r28b or newer from https://developer.android.com/ndk/downloads#stable-downloads. Specific steps are:

Download the .dmg file from https://developer.android.com/ndk/downloads#stable-downloads.
Open the .dmg file and move the AndroidNDK* file to your preferred location (say /path/to/AndroidNDK/)
Make ANDROID_NDK_HOME to point to the NDK directory. It should be something like:

export ANDROID_NDK_HOME=/path/to/AndroidNDK/AndroidNDK*.app/Contents/NDK/

Tips: make sure your ANDROID_NDK_HOME points to the directory that has README.md in it.

With the set up, try to build the litert_lm_main binary:

bazel build --config=android_arm64 //runtime/engine:litert_lm_main

Additionally, we should build the dispatch API library in order for the LiteRT-LM runtime to interact with NPU:

bazel build --config=android_arm64 \
    @litert//litert/vendors/qualcomm/dispatch:dispatch_api_so

Step 4: Run the model on device

After the binary is successfully built, we can now try to run the model on device. Make sure you have the write access to the DEVICE_FOLDER:

In order to run the binary on your Android device, we have to push a few assets / binaries. First set your DEVICE_FOLDER, make sure you have the write access to it (typically you can put things under /data/local/tmp/):

export DEVICE_FOLDER=/data/local/tmp/
adb shell mkdir -p $DEVICE_FOLDER

export MODEL_PATH=<path to your downloaded .litertlm >

Push the .litertlm file. Tip: you only need to push those assets once.

adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm

Push the QAIRT libraries. You can find them located in the unzipped folder in Step 2 $QAIRT_ROOT/lib/aarch64-android/. Note that the QAIRT_ROOT should contain the version number, e.g. 2.42.0.251225.

adb push $QAIRT_ROOT/lib/aarch64-android/libQnnHtp*Stub.so $DEVICE_FOLDER
adb push $QAIRT_ROOT/lib/aarch64-android/libQnnHtp.so $DEVICE_FOLDER
adb push $QAIRT_ROOT/lib/aarch64-android/libQnnSystem.so $DEVICE_FOLDER
adb push $QAIRT_ROOT/lib/aarch64-android/libQnnHtpPrepare.so $DEVICE_FOLDER
adb push $QAIRT_ROOT/lib/hexagon-*/unsigned/libQnnHtp*Skel.so $DEVICE_FOLDER

Push the dispatch API and main binary compiled in Step 3.

adb push bazel-bin/external/litert/litert/vendors/qualcomm/*/*.so \
    $DEVICE_FOLDER
adb push prebuilt/android_arm64/*.so $DEVICE_FOLDER
adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER

Now, you can execute the binary.

adb shell LD_LIBRARY_PATH=$DEVICE_FOLDER ADSP_LIBRARY_PATH=$DEVICE_FOLDER \
    $DEVICE_FOLDER/litert_lm_main \
    --backend=npu \
    --model_path=$DEVICE_FOLDER/model.litertlm

MediaTek NeuroPilot

The steps to run the LLMs on MediaTek NPU are as the following:

Step 1: Download the .litertlm model

The LiteRT-LM runtime is designed to work with models in the .litertlm format. You can find and download compatible models in the following table.

Model	SoC	Quantization	Context size	Model Size (Mb)	Download link
Gemma3-1B	MT6989	4-bit per-channel	1280	985	download
Gemma3-1B	MT6991	4-bit per-channel	1280	986	download

SOC_MODEL=$(adb shell getprop ro.soc.model | tr '[:upper:]' '[:lower:]')
echo "https://huggingface.co/litert-community/Gemma3-1B-IT/blob/main/Gemma3-1B-IT_q4_ekv1280_${SOC_MODEL}.litertlm"

Step 2:Build the LiteRT-LM runtime / libraries

Develop in Linux

To be able to build the binary for Android, one needs to install NDK r28b or newer from https://developer.android.com/ndk/downloads#stable-downloads. Specific steps are:

Download the zip file from https://developer.android.com/ndk/downloads#stable-downloads.
Extract the zip file to your preferred location (say /path/to/AndroidNDK/)
Make ANDROID_NDK_HOME to point to the NDK directory. It should be something like:

export ANDROID_NDK_HOME=/path/to/AndroidNDK/

Tips: make sure your ANDROID_NDK_HOME points to the directory that has README.md in it.

With the set up, try to build the litert_lm_main binary:

bazel build --config=android_arm64 //runtime/engine:litert_lm_main

Additionally, we should build the dispatch API library in order for the LiteRT-LM runtime to interact with NPU:

bazel build --config=android_arm64 \
    @litert//litert/vendors/mediatek/dispatch:dispatch_api_so

Develop in macOS

Xcode command line tools include clang. Run xcode-select --install if not installed before.

To be able to build the binary for Android, one needs to install NDK r28b or newer from https://developer.android.com/ndk/downloads#stable-downloads. Specific steps are:

Download the .dmg file from https://developer.android.com/ndk/downloads#stable-downloads.
Open the .dmg file and move the AndroidNDK* file to your preferred location (say /path/to/AndroidNDK/)
Make ANDROID_NDK_HOME to point to the NDK directory. It should be something like:

export ANDROID_NDK_HOME=/path/to/AndroidNDK/AndroidNDK*.app/Contents/NDK/

Tips: make sure your ANDROID_NDK_HOME points to the directory that has README.md in it.

With the set up, try to build the litert_lm_main binary:

bazel build --config=android_arm64 //runtime/engine:litert_lm_main

Additionally, we should build the dispatch API library in order for the LiteRT-LM runtime to interact with NPU:

bazel build --config=android_arm64 \
    @litert//litert/vendors/mediatek/dispatch:dispatch_api_so

Step 3: Run the model on device

After the binary is successfully built, we can now try to run the model on device. Make sure you have the write access to the DEVICE_FOLDER:

export DEVICE_FOLDER=/data/local/tmp/
adb shell mkdir -p $DEVICE_FOLDER

export MODEL_PATH=<path to your downloaded .litertlm >

Push the .litertlm file. Tip: you only need to push those assets once.

adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm

Push the dispatch API and main binary compiled in Step 2.

adb push bazel-bin/external/litert/litert/vendors/mediatek/*/*.so \
    $DEVICE_FOLDER
adb push prebuilt/android_arm64/*.so $DEVICE_FOLDER
adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER

Now, you can execute the binary.

adb shell LD_LIBRARY_PATH=$DEVICE_FOLDER \
    $DEVICE_FOLDER/litert_lm_main \
    --backend=npu \
    --model_path=$DEVICE_FOLDER/model.litertlm