Benchmark CompiledModel API

LiteRT benchmark tools measure and calculate statistics for the following important performance metrics:

  • Initialization time
  • Inference time of warmup state
  • Inference time of steady state
  • Memory usage during initialization time
  • Overall memory usage

The CompiledModel benchmark tool is provided as a C++ binary, benchmark_model. You can execute this tool from a shell command line on Android, Linux, macOS, Windows, and embedded devices with GPU acceleration enabled.

Download prebuilt benchmark binaries

Download the nightly prebuilt command-line binaries by following the links following:

Build benchmark binary from source

You can build the benchmark binary from source.

bazel build -c opt //litert/tools:benchmark_model

To build with Android NDK toolchain, you need to set up the build environment first by following this guide, or use the docker image as described in this guide.

bazel build -c opt --config=android_arm64 \
  //litert/tools:benchmark_model

Run benchmark

To run benchmarks, execute the binary from the shell.

path/to/downloaded_or_built/benchmark_model \
  --graph=your_model.tflite \
  --num_threads=4

More parameter options can be found in the source code of benchmark_model.

Benchmark GPU acceleration

These prebuilt binaries include LiteRT GPU Accelerator. It supports

  • Android: OpenCL
  • Linux: OpenCL and WebGPU (backed by Vulkan)
  • macOS: Metal
  • Windows: WebGPU (backed by Direct3D)

To use the GPU Accelerator, pass the flag --use_gpu=true.

Profile model ops

The benchmark model binary also let you profile model ops and get the execution times of each operator. To do this, pass the flag --use_profiler=true to benchmark_model during invocation.