隆重推出 Google AI Edge Portal：大规模对边缘 AI 进行基准测试。注册以在非公开预览期间申请访问权限。

开始在 Android 上使用 LiteRT-LM

LiteRT-LM 的 Kotlin API，适用于 Android 和 JVM（Linux、MacOS、Windows），具有 GPU 和 NPU 加速、多模态和工具使用等功能。

简介

以下是使用 Kotlin API 构建的终端聊天应用示例：

import com.google.ai.edge.litertlm.*

suspend fun main() {
  Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // Hide log for TUI app

  val engineConfig = EngineConfig(modelPath = "/path/to/model.litertlm")
  Engine(engineConfig).use { engine ->
    engine.initialize()

    engine.createConversation().use { conversation ->
      while (true) {
        print("\n>>> ")
        conversation.sendMessageAsync(readln()).collect { print(it) }
      }
    }
  }
}

Kotlin 示例代码的演示

如需试用上述示例，请克隆代码库并使用 example/Main.kt 运行：

bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:main -- <abs_model_path>

如需查看可用的 .litertlm 模型，请访问 HuggingFace LiteRT 社区。上述动画使用的是 Gemma3-1B-IT。

如需查看 Android 示例，请参阅 Google AI Edge Gallery 应用。

Gradle 使用入门

虽然 LiteRT-LM 是使用 Bazel 开发的，但我们为 Gradle/Maven 用户提供了 Maven 软件包。

1. 添加 Gradle 依赖项

dependencies {
    // For Android
    implementation("com.google.ai.edge.litertlm:litertlm-android:latest.release")

    // For JVM (Linux, MacOS, Windows)
    implementation("com.google.ai.edge.litertlm:litertlm-jvm:latest.release")
}

您可以在 Google Maven 中找到 litertlm-android 和 litertlm-jvm 的可用版本。

latest.release 可用于获取最新版本。

2. 初始化引擎

Engine 是 API 的入口点。使用模型路径和配置对其进行初始化。请务必关闭引擎以释放资源。

注意：engine.initialize() 方法可能需要相当长的时间（例如，最多 10 秒）才能加载模型。强烈建议在后台线程或协程中调用此方法，以避免阻塞界面线程。

import com.google.ai.edge.litertlm.Backend
import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm", // Replace with your model path
    backend = Backend.GPU(), // Or Backend.NPU(nativeLibraryDir = "...")
    // Optional: Pick a writable dir. This can improve 2nd load time.
    // cacheDir = "/tmp/" or context.cacheDir.path (for Android)
)

val engine = Engine(engineConfig)
engine.initialize()
// ... Use the engine to create a conversation ...

// Close the engine when done
engine.close()

在 Android 上，如需使用 GPU 后端，应用需要通过在 <application> 标记内的 AndroidManifest.xml 中添加以下内容来明确请求依赖的原生库：

  <application>
    <uses-native-library android:name="libvndksupport.so" android:required="false"/>
    <uses-native-library android:name="libOpenCL.so" android:required="false"/>
  </application>

如需使用 NPU 后端，您可能需要指定包含 NPU 库的目录。在 Android 上，如果库与应用捆绑在一起，请将其设置为 context.applicationInfo.nativeLibraryDir。如需详细了解 NPU 原生库，请参阅 LiteRT-LM NPU。

val engineConfig = EngineConfig(
    modelPath = modelPath,
    backend = Backend.NPU(nativeLibraryDir = context.applicationInfo.nativeLibraryDir)
)

3. 创建对话

初始化引擎后，创建 Conversation 实例。您可以提供 ConversationConfig 来自定义其行为。

import com.google.ai.edge.litertlm.ConversationConfig
import com.google.ai.edge.litertlm.Message
import com.google.ai.edge.litertlm.SamplerConfig

// Optional: Configure the system instruction, initial messages, sampling
// parameters, etc.
val conversationConfig = ConversationConfig(
    systemInstruction = Contents.of("You are a helpful assistant."),
    initialMessages = listOf(
        Message.user("What is the capital city of the United States?"),
        Message.model("Washington, D.C."),
    ),
    samplerConfig = SamplerConfig(topK = 10, topP = 0.95, temperature = 0.8),
)

val conversation = engine.createConversation(conversationConfig)
// Or with default config:
// val conversation = engine.createConversation()

// ... Use the conversation ...

// Close the conversation when done
conversation.close()

Conversation 实现了 AutoCloseable，因此您可以将 use 代码块用于一次性或短暂对话的自动资源管理：

engine.createConversation(conversationConfig).use { conversation ->
    // Interact with the conversation
}

4. 发送消息

您可以通过以下三种方式发送消息：

sendMessage(contents): Message：同步调用，会阻塞，直到模型返回完整响应。这对于基本的请求/响应互动来说更简单。
sendMessageAsync(contents, callback)：用于流式传输响应的异步调用。这种方式更适合长时间运行的请求，或者当您希望在生成响应时显示响应时使用。
sendMessageAsync(contents): Flow<Message>：异步调用，返回用于流式传输响应的 Kotlin Flow。建议协程用户采用此方法。

同步示例：

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message

print(conversation.sendMessage("What is the capital of France?"))

使用回调的异步示例：

使用 sendMessageAsync 向模型发送消息，并通过回调接收回答。

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message
import com.google.ai.edge.litertlm.MessageCallback
import java.util.concurrent.CountDownLatch
import java.util.concurrent.TimeUnit

val callback = object : MessageCallback {
    override fun onMessage(message: Message) {
        print(message)
    }

    override fun onDone() {
        // Streaming completed
    }

    override fun onError(throwable: Throwable) {
        // Error during streaming
    }
}

conversation.sendMessageAsync("What is the capital of France?", callback)

使用 Flow 的异步示例：

使用 sendMessageAsync（不含回调实参）向模型发送消息，并通过 Kotlin Flow 接收回答。

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message
import kotlinx.coroutines.flow.catch
import kotlinx.coroutines.launch

// Within a coroutine scope
conversation.sendMessageAsync("What is the capital of France?")
    .catch { ... } // Error during streaming
    .collect { print(it.toString()) }

5. 多模态

Message 对象可以包含不同类型的 Content，包括 Text、ImageBytes、ImageFile、AudioBytes 和 AudioFile。

// Initialize the `visionBackend` and/or the `audioBackend`
val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm", // Replace with your model path
    backend = Backend.CPU(), // Or Backend.GPU() or Backend.NPU(...)
    visionBackend = Backend.GPU(), // Or Backend.NPU(...)
    audioBackend = Backend.CPU(), // Or Backend.NPU(...)
)

// Sends a message with multi-modality.
// See the Content class for other variants.
conversation.sendMessage(Contents.of(
    Content.ImageFile("/path/to/image"),
    Content.AudioBytes(audioBytes), // ByteArray of the audio
    Content.Text("Describe this image and audio."),
))

6. 定义和使用工具

您可以通过以下两种方式定义工具：

使用 Kotlin 函数（建议在大多数情况下使用）
使用 Open API 规范（完全控制工具规范和执行）

使用 Kotlin 函数定义工具

您可以将自定义 Kotlin 函数定义为模型可以调用的工具，以执行操作或提取信息。

创建一个实现 ToolSet 的类，并为方法添加 @Tool 注解，为参数添加 @ToolParam 注解。

import com.google.ai.edge.litertlm.Tool
import com.google.ai.edge.litertlm.ToolParam

class SampleToolSet: ToolSet {
    @Tool(description = "Get the current weather for a city")
    fun getCurrentWeather(
        @ToolParam(description = "The city name, e.g., San Francisco") city: String,
        @ToolParam(description = "Optional country code, e.g., US") country: String? = null,
        @ToolParam(description = "Temperature unit (celsius or fahrenheit). Default: celsius") unit: String = "celsius"
    ): Map<String, Any> {
        // In a real application, you would call a weather API here
        return mapOf("temperature" to 25, "unit" to  unit, "condition" to "Sunny")
    }

    @Tool(description = "Get the sum of a list of numbers.")
    fun sum(
        @ToolParam(description = "The numbers, could be floating point.") numbers: List<Double>,
    ): Double {
        return numbers.sum()
    }
}

在后台，该 API 会检查这些注释和函数签名，以生成 OpenAPI 样式的架构。此架构向语言模型描述了工具的功能、参数（包括来自 @ToolParam 的类型和说明）和返回类型。

参数类型

使用 @ToolParam 注释的参数的类型可以是 String、Int、Boolean、Float、Double 或这些类型的 List（例如，List<String>）。使用可为 null 的类型（例如 String?）来表示可为 null 的参数。设置一个默认值来指明该参数是可选的，并在 @ToolParam 的说明中提及该默认值。

返回值类型

工具函数的返回类型可以是任何 Kotlin 类型。结果将转换为 JSON 元素，然后再发送回模型。

List 类型会转换为 JSON 数组。
Map 类型会转换为 JSON 对象。
原始类型（String、Number、Boolean）会转换为相应的 JSON 原始类型。
其他类型则通过 toString() 方法转换为字符串。

对于结构化数据，建议返回 Map 或将转换为 JSON 对象的数据类。

使用 OpenAPI 规范定义工具

或者，您也可以通过实现 OpenApiTool 类并以符合 Open API 规范的 JSON 字符串形式提供工具的说明来定义工具。如果您已经有工具的 OpenAPI 架构，或者需要对工具的定义进行精细控制，则此方法非常有用。

import com.google.ai.edge.litertlm.OpenApiTool

class SampleOpenApiTool : OpenApiTool {

    override fun getToolDescriptionJsonString(): String {
        return """
        {
          "name": "addition",
          "description": "Add all numbers.",
          "parameters": {
            "type": "object",
            "properties": {
              "numbers": {
                "type": "array",
                "items": {
                  "type": "number"
                }
              },
              "description": "The list of numbers to sum."
            },
            "required": [
              "numbers"
            ]
          }
        }
        """.trimIndent() // Tip: trim to save tokens
    }

    override fun execute(paramsJsonString: String): String {
        // Parse paramsJsonString with your choice of parser/deserializer and
        // execute the tool.

        // Return the result as a JSON string
        return """{"result": 1.4142}"""
    }
}

注册工具

在 ConversationConfig 中包含工具的实例。

val conversation = engine.createConversation(
    ConversationConfig(
        tools = listOf(
            tool(SampleToolSet()),
            tool(SampleOpenApiTool()),
        ),
        // ... other configs
    )
)

// Send messages that might trigger the tool
conversation.sendMessageAsync("What's the weather like in London?", callback)

模型会根据对话内容决定何时调用工具。工具执行结果会自动发送回模型，以生成最终回答。

手动工具调用

默认情况下，LiteRT-LM 会自动执行模型生成的工具调用，并将工具执行结果自动发送回模型，以生成下一个回答。

如果您想手动执行工具并将结果发送回模型，可以在 ConversationConfig 中将 automaticToolCalling 设置为 false。

val conversation = engine.createConversation(
    ConversationConfig(
        tools = listOf(
            tool(SampleOpenApiTool()),
        ),
        automaticToolCalling = false,
    )
)

如果您停用自动工具调用，则需要在应用代码中手动执行工具并将结果发送回模型。当 automaticToolCalling 设置为 false 时，OpenApiTool 的 execute 方法将不会自动调用。

// Send a message that triggers a tool call.
val responseMessage = conversation.sendMessage("What's the weather like in London?")

// The model returns a Message with `toolCalls` populated.
if (responseMessage.toolCalls.isNotEmpty()) {
    val toolResponses = mutableListOf<Content.ToolResponse>()
    // There can be multiple tool calls in a single response.
    for (toolCall in responseMessage.toolCalls) {
        println("Model wants to call: ${toolCall.name} with arguments: ${toolCall.arguments}")

        // Execute the tool manually with your own logic. `executeTool` is just an example here.
        val toolResponseJson = executeTool(toolCall.name, toolCall.arguments)

        // Collect tool responses.
        toolResponses.add(Content.ToolResponse(toolCall.name, toolResponseJson))
    }

    // Use Message.tool to create the tool response message.
    val toolResponseMessage = Message.tool(Contents.of(toolResponses))

    // Send the tool response message to the model.
    val finalMessage = conversation.sendMessage(toolResponseMessage)
    println("Final answer: ${finalMessage.text}") // e.g., "The weather in London is 25c."
}

示例

如需试用工具使用，请克隆代码库并使用 example/ToolMain.kt 运行：

bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:tool -- <abs_model_path>

错误处理

API 方法可能会针对来自原生层的错误抛出 LiteRtLmJniException，或者针对生命周期问题抛出标准 Kotlin 异常（例如 IllegalStateException）。始终将 API 调用封装在 try-catch 块中。MessageCallback 中的 onError 回调也会报告异步操作期间的错误。