Introducing LiteRT: Google's high-performance runtime for on-device AI, formerly known as TensorFlow Lite. Learn more

LiteRT in Google Play services Java (and Kotlin) API

LiteRT in Google Play services can also be accessed using Java APIs, which can be used from Java or Kotlin code, in addition to the Native API. In particular, LiteRT in Google Play services is available through the LiteRT Interpreter API.

Using the Interpreter APIs

The LiteRT Interpreter API, provided by the TensorFlow runtime, provides a general-purpose interface for building and running ML models. Use the following steps to run inferences with the Interpreter API using the TensorFlow Lite in Google Play services runtime.

1. Add project dependencies

Add the following dependencies to your app project code to access the Play services API for LiteRT:

dependencies {
...
    // LiteRT dependencies for Google Play services
    implementation 'com.google.android.gms:play-services-tflite-java:16.1.0'
    // Optional: include LiteRT Support Library
    implementation 'com.google.android.gms:play-services-tflite-support:16.1.0'
...
}

2. Add initialization of LiteRT

Initialize the LiteRT component of the Google Play services API before using the LiteRT APIs:

KotlinJava

val initializeTask: Task<Void> by lazy { TfLite.initialize(this) }

Task<Void> initializeTask = TfLite.initialize(context);

3. Create an Interpreter and set runtime option

Create an interpreter using InterpreterApi.create() and configure it to use Google Play services runtime, by calling InterpreterApi.Options.setRuntime(), as shown in the following example code:

KotlinJava

import org.tensorflow.lite.InterpreterApi
import org.tensorflow.lite.InterpreterApi.Options.TfLiteRuntime
...
private lateinit var interpreter: InterpreterApi
...
initializeTask.addOnSuccessListener {
  val interpreterOption =
    InterpreterApi.Options().setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
  interpreter = InterpreterApi.create(
    modelBuffer,
    interpreterOption
  )}
  .addOnFailureListener { e ->
    Log.e("Interpreter", "Cannot initialize interpreter", e)
  }

import org.tensorflow.lite.InterpreterApi
import org.tensorflow.lite.InterpreterApi.Options.TfLiteRuntime
...
private InterpreterApi interpreter;
...
initializeTask.addOnSuccessListener(a -> {
    interpreter = InterpreterApi.create(modelBuffer,
      new InterpreterApi.Options().setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY));
  })
  .addOnFailureListener(e -> {
    Log.e("Interpreter", String.format("Cannot initialize interpreter: %s",
          e.getMessage()));
  });

You should use the implementation above because it avoids blocking the Android user interface thread. If you need to manage thread execution more closely, you can add a Tasks.await() call to interpreter creation:

KotlinJava

import androidx.lifecycle.lifecycleScope
...
lifecycleScope.launchWhenStarted { // uses coroutine
  initializeTask.await()
}

@BackgroundThread
InterpreterApi initializeInterpreter() {
    Tasks.await(initializeTask);
    return InterpreterApi.create(...);
}

4. Run inferences

Using the interpreter object you created, call the run() method to generate an inference.

KotlinJava

interpreter.run(inputBuffer, outputBuffer)

interpreter.run(inputBuffer, outputBuffer);

Hardware acceleration

LiteRT allows you to accelerate the performance of your model using specialized hardware processors, such as graphics processing units (GPUs). You can take advantage of these specialized processors using hardware drivers called delegates.

The GPU delegate is provided through Google Play services and is dynamically loaded, just like the Play services versions of the Interpreter API.

Checking device compatibility

Not all devices support GPU hardware acceleration with TFLite. In order to mitigate errors and potential crashes, use the TfLiteGpu.isGpuDelegateAvailable method to check whether a device is compatible with the GPU delegate.

Use this method to confirm whether a device is compatible with GPU, and use CPU as a fallback for when GPU is not supported.

useGpuTask = TfLiteGpu.isGpuDelegateAvailable(context)

Once you have a variable like useGpuTask, you can use it to determine whether devices use the GPU delegate.

KotlinJava

val interpreterTask = useGpuTask.continueWith { task ->
  val interpreterOptions = InterpreterApi.Options()
      .setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
  if (task.result) {
      interpreterOptions.addDelegateFactory(GpuDelegateFactory())
  }
  InterpreterApi.create(FileUtil.loadMappedFile(context, MODEL_PATH), interpreterOptions)
}

Task<InterpreterApi.Options> interpreterOptionsTask = useGpuTask.continueWith({ task ->
  InterpreterApi.Options options =
      new InterpreterApi.Options().setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY);
  if (task.getResult()) {
     options.addDelegateFactory(new GpuDelegateFactory());
  }
  return options;
});

GPU with Interpreter APIs

To use the GPU delegate with the Interpreter APIs:

Update the project dependencies to use the GPU delegate from Play services:
```
implementation 'com.google.android.gms:play-services-tflite-gpu:16.2.0'
```

Enable the GPU delegate option in the TFlite initialization:

KotlinJava

TfLite.initialize(context,
  TfLiteInitializationOptions.builder()
    .setEnableGpuDelegateSupport(true)
    .build())

TfLite.initialize(context,
  TfLiteInitializationOptions.builder()
    .setEnableGpuDelegateSupport(true)
    .build());

Enable GPU delegate in the interpreter options: set the delegate factory to GpuDelegateFactory by calling addDelegateFactory() withinInterpreterApi.Options()`:

KotlinJava

val interpreterOption = InterpreterApi.Options()
  .setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
  .addDelegateFactory(GpuDelegateFactory())

Options interpreterOption = InterpreterApi.Options()
  .setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY)
  .addDelegateFactory(new GpuDelegateFactory());

Migrating from stand-alone LiteRT

If you are planning to migrate your app from stand-alone LiteRT to the Play services API, review the following additional guidance for updating your app project code:

Review the Limitations section of this page to ensure your use case is supported.
Prior to updating your code, we recommend doing performance and accuracy checks for your models, particularly if you are using versions of LiteRT (TF Lite) earlier than version 2.1, so you have a baseline to compare against the new implementation.
If you have migrated all of your code to use the Play services API for LiteRT, you should remove the existing LiteRT runtime library dependencies (entries with org.tensorflow:tensorflow-lite:*) from your build.gradle file so that you can reduce your app size.
Identify all occurrences of new Interpreter object creation in your code, and modify each one so that it uses the InterpreterApi.create() call. The new TfLite.initialize is asynchronous, which means in most cases it's not a drop-in replacement: you must register a listener for when the call completes. Refer to the code snippet in Step 3 code.
Add import org.tensorflow.lite.InterpreterApi; and import org.tensorflow.lite.InterpreterApi.Options.TfLiteRuntime; to any source files using the org.tensorflow.lite.Interpreter or org.tensorflow.lite.InterpreterApi classes.
If any of the resulting calls to InterpreterApi.create() have only a single argument, append new InterpreterApi.Options() to the argument list.
Append .setRuntime(TfLiteRuntime.FROM_SYSTEM_ONLY) to the last argument of any calls to InterpreterApi.create().
Replace all other occurrences of the org.tensorflow.lite.Interpreter class with org.tensorflow.lite.InterpreterApi.

If you want to use stand-alone LiteRT and the Play services API side-by-side, you must use LiteRT (TF Lite) version 2.9 or later. LiteRT (TF Lite) version 2.8 and earlier versions are not compatible with the Play services API version.