LiteRT CompiledModel Kotlin API

The LiteRT CompiledModel API is available in Kotlin, offering Android developers a seamless, accelerator-first experience with high-level APIs. For an example, see the Image segmentation Kotlin App.

The following guide shows the basic CPU inference of the CompiledModel Kotlin API. See guide on GPU acceleration and NPU acceleration for advanced acceleration features.

Add Maven package

Add the LiteRT Maven package to your Android project:

dependencies {
  ...
  implementation `com.google.ai.edge.litert:litert:2.1.0`
}

Basic inference

Create CompiledModel

Initialize the runtime with a model and your choice of hardware acceleration:

val  model =
  CompiledModel.create(
    context.assets,
    "mymodel.tflite",
    CompiledModel.Options(Accelerator.CPU),
    env,
  )

Create Input and Output Buffers

Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.

val inputBuffers = model.createInputBuffers()
val outputBuffers = model.createOutputBuffers()

If you are using CPU memory, fill the inputs by writing data directly into the first input buffer.

inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })

Invoke the model

Providing the input and output buffers, run the model.

model.run(inputBuffers, outputBuffers)

Retrieve Outputs

Retrieve outputs by directly reading the model output from memory.

val outputFloatArray = outputBuffers[0].readFloat()

Use TensorBuffer

LiteRT provides built-in support for I/O buffer interoperability, using the Tensor Buffer API (TensorBuffer) to handle the flow of data into and out of the CompiledModel. The Tensor Buffer API provides the ability to write (Write<T>()) and read (Read<T>()), and lock buffers.

For a more complete view of how the Tensor Buffer API is implemented, see the source code at TensorBuffer.kt.