Google AI Edge Portal 소개: 대규모로 엣지 AI를 벤치마킹합니다. 비공개 미리보기 기간에 액세스 권한을 요청하려면 가입하세요.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Android에서 LiteRT-LM 시작하기

GPU 및 NPU 가속, 멀티모달리티, 도구 사용과 같은 기능을 갖춘 Android 및 JVM (Linux, macOS, Windows)용 LiteRT-LM의 Kotlin API

소개

다음은 Kotlin API로 빌드된 샘플 터미널 채팅 앱입니다.

import com.google.ai.edge.litertlm.*

suspend fun main() {
  Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // Hide log for TUI app

  val engineConfig = EngineConfig(modelPath = "/path/to/model.litertlm")
  Engine(engineConfig).use { engine ->
    engine.initialize()

    engine.createConversation().use { conversation ->
      while (true) {
        print("\n>>> ")
        conversation.sendMessageAsync(readln()).collect { print(it) }
      }
    }
  }
}

Kotlin 샘플 코드 데모

위 샘플을 사용해 보려면 저장소를 클론하고 example/Main.kt로 실행하세요.

bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:main -- <abs_model_path>

사용 가능한 .litertlm 모델은 HuggingFace LiteRT 커뮤니티에 있습니다. 위 애니메이션에서는 Gemma3-1B-IT를 사용했습니다.

Android 샘플의 경우 Google AI Edge 갤러리 앱(Google Play에서 사용 가능)을 확인하세요.

Gradle 시작하기

LiteRT-LM은 Bazel로 개발되었지만 Gradle 또는 Maven 사용자를 위해 Maven 패키지를 제공합니다.

Gradle 종속 항목 추가

dependencies {
    // For Android
    implementation("com.google.ai.edge.litertlm:litertlm-android:latest.release")

    // For JVM (Linux, macOS, Windows)
    implementation("com.google.ai.edge.litertlm:litertlm-jvm:latest.release")
}

Google Maven의 litertlm-android 및 litertlm-jvm에서 사용 가능한 버전을 확인할 수 있습니다.

latest.release를 사용하여 최신 버전을 가져올 수 있습니다.

엔진 초기화

Engine은 API의 진입점입니다. 모델 경로와 구성으로 초기화합니다. 리소스를 해제하려면 엔진을 닫아야 합니다.

참고: engine.initialize() 메서드는 모델을 로드하는 데 상당한 시간이 걸릴 수 있습니다(예: 최대 10초). UI 스레드 차단을 방지하기 위해 백그라운드 스레드나 코루틴에서 이를 호출하는 것이 좋습니다.

import com.google.ai.edge.litertlm.Backend
import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm", // Replace with your model path
    backend = Backend.GPU(), // Or Backend.NPU(nativeLibraryDir = "...")
    // Optional: Pick a writable dir. This can improve 2nd load time.
    // cacheDir = "/tmp/" or context.cacheDir.path (for Android)
)

val engine = Engine(engineConfig)
engine.initialize()
// ... Use the engine to create a conversation ...

// Close the engine when done
engine.close()

Android에서 GPU 백엔드를 사용하려면 앱이 <application> 태그 내의 AndroidManifest.xml에 다음을 추가하여 종속 네이티브 라이브러리를 명시적으로 요청해야 합니다.

  <application>
    <uses-native-library android:name="libvndksupport.so" android:required="false"/>
    <uses-native-library android:name="libOpenCL.so" android:required="false"/>
  </application>

NPU 백엔드를 사용하려면 NPU 라이브러리가 포함된 디렉터리를 지정해야 할 수 있습니다. Android에서 라이브러리가 앱과 함께 번들로 제공되는 경우 context.applicationInfo.nativeLibraryDir로 설정합니다. NPU 네이티브 라이브러리에 관한 자세한 내용은 LiteRT-LM NPU를 참고하세요.

val engineConfig = EngineConfig(
    modelPath = modelPath,
    backend = Backend.NPU(nativeLibraryDir = context.applicationInfo.nativeLibraryDir)
)

대화 만들기

엔진이 초기화되면 Conversation 인스턴스를 만듭니다. ConversationConfig를 제공하여 동작을 맞춤설정할 수 있습니다.

import com.google.ai.edge.litertlm.ConversationConfig
import com.google.ai.edge.litertlm.Message
import com.google.ai.edge.litertlm.SamplerConfig

// Optional: Configure the system instruction, initial messages, sampling
// parameters, etc.
val conversationConfig = ConversationConfig(
    systemInstruction = Contents.of("You are a helpful assistant."),
    initialMessages = listOf(
        Message.user("What is the capital city of the United States?"),
        Message.model("Washington, D.C."),
    ),
    samplerConfig = SamplerConfig(topK = 10, topP = 0.95, temperature = 0.8),
)

val conversation = engine.createConversation(conversationConfig)
// Or with default config:
// val conversation = engine.createConversation()

// ... Use the conversation ...

// Close the conversation when done
conversation.close()

Conversation는 AutoCloseable을 구현하므로 일회성 또는 단기 대화의 자동 리소스 관리에 use 블록을 사용할 수 있습니다.

engine.createConversation(conversationConfig).use { conversation ->
    // Interact with the conversation
}

메시지 보내기

메시지를 보내는 방법에는 세 가지가 있습니다.

sendMessage(contents): Message: 모델이 완전한 응답을 반환할 때까지 차단되는 동기 호출입니다. 이는 기본 요청 및 응답 상호작용에 더 간단합니다.
sendMessageAsync(contents, callback): 스트리밍 응답을 위한 비동기 호출입니다. 이 방법은 장기 실행 요청에 적합하며 응답이 생성되는 대로 표시하려는 경우에도 적합합니다.
sendMessageAsync(contents): Flow<Message>: 응답 스트리밍을 위한 Kotlin Flow를 반환하는 비동기 호출입니다. 이는 코루틴 사용자에게 권장되는 접근 방식입니다.

동기 예:

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message

print(conversation.sendMessage("What is the capital of France?"))

콜백을 사용하는 비동기 예:

sendMessageAsync를 사용하여 모델에 메시지를 보내고 콜백을 통해 응답을 받습니다.

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message
import com.google.ai.edge.litertlm.MessageCallback
import java.util.concurrent.CountDownLatch
import java.util.concurrent.TimeUnit

val callback = object : MessageCallback {
    override fun onMessage(message: Message) {
        print(message)
    }

    override fun onDone() {
        // Streaming completed
    }

    override fun onError(throwable: Throwable) {
        // Error during streaming
    }
}

conversation.sendMessageAsync("What is the capital of France?", callback)

흐름을 사용한 비동기 예:

sendMessageAsync (콜백 인수 없음)를 사용하여 모델에 메시지를 보내고 Kotlin Flow를 통해 응답을 받습니다.

import com.google.ai.edge.litertlm.Content
import com.google.ai.edge.litertlm.Message
import kotlinx.coroutines.flow.catch
import kotlinx.coroutines.launch

// Within a coroutine scope
conversation.sendMessageAsync("What is the capital of France?")
    .catch { ... } // Error during streaming
    .collect { print(it.toString()) }

🔴 신규: 멀티 토큰 예측 (MTP)

다중 토큰 예측 (MTP)은 디코딩 속도를 크게 가속화하는 성능 최적화입니다. MTP는 GPU 백엔드의 모든 작업에 권장됩니다.

MTP를 사용하려면 엔진을 초기화하기 전에 ExperimentalFlags를 사용하여 추측 디코딩을 사용 설정하세요.

import com.google.ai.edge.litertlm.ExperimentalApi
import com.google.ai.edge.litertlm.ExperimentalFlags
import com.google.ai.edge.litertlm.Backend
import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig

// Enable MTP via speculative decoding
@OptIn(ExperimentalApi::class)
ExperimentalFlags.enableSpeculativeDecoding = true

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm",
    backend = Backend.GPU(),
)

val engine = Engine(engineConfig)
engine.initialize()

// The same steps to create Conversation and send messages as below...

멀티모달리티

Message 객체에는 Text, ImageBytes, ImageFile, AudioBytes, AudioFile 등 다양한 유형의 Content이 포함될 수 있습니다.

// Initialize the `visionBackend`, `audioBackend`, or both
val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm", // Replace with your model path
    backend = Backend.CPU(), // Or Backend.GPU() or Backend.NPU(...)
    visionBackend = Backend.GPU(), // Or Backend.NPU(...)
    audioBackend = Backend.CPU(), // Or Backend.NPU(...)
)

// Sends a message with multi-modality.
// See the Content class for other variants.
conversation.sendMessage(Contents.of(
    Content.ImageFile("/path/to/image"),
    Content.AudioBytes(audioBytes), // ByteArray of the audio
    Content.Text("Describe this image and audio."),
))

도구 정의 및 사용

도구를 정의하는 방법에는 두 가지가 있습니다.

Kotlin 함수 사용 (대부분의 경우 권장)
Open API 사양 사용 (도구 사양 및 실행을 완전히 제어)

Kotlin 함수로 도구 정의

모델이 작업을 실행하거나 정보를 가져오기 위해 호출할 수 있는 도구로 맞춤 Kotlin 함수를 정의할 수 있습니다.

ToolSet를 구현하는 클래스를 만들고 메서드에 @Tool 주석을 추가하고 매개변수에 @ToolParam 주석을 추가합니다.

import com.google.ai.edge.litertlm.Tool
import com.google.ai.edge.litertlm.ToolParam

class SampleToolSet: ToolSet {
    @Tool(description = "Get the current weather for a city")
    fun getCurrentWeather(
        @ToolParam(description = "The city name, e.g., San Francisco") city: String,
        @ToolParam(description = "Optional country code, e.g., US") country: String? = null,
        @ToolParam(description = "Temperature unit (celsius or fahrenheit). Default: celsius") unit: String = "celsius"
    ): Map<String, Any> {
        // In a real application, you would call a weather API here
        return mapOf("temperature" to 25, "unit" to  unit, "condition" to "Sunny")
    }

    @Tool(description = "Get the sum of a list of numbers.")
    fun sum(
        @ToolParam(description = "The numbers, could be floating point.") numbers: List<Double>,
    ): Double {
        return numbers.sum()
    }
}

API는 이러한 주석과 함수 서명을 검사하여 OpenAPI 스타일 스키마를 생성합니다. 이 스키마는 언어 모델에 도구의 기능, 매개변수 (@ToolParam의 유형 및 설명 포함), 반환 유형을 설명합니다.

매개변수 유형

@ToolParam로 주석이 추가된 매개변수의 유형은 String, Int, Boolean, Float, Double 또는 이러한 유형의 List (예: List<String>)일 수 있습니다. null 허용 매개변수를 나타내려면 null 허용 유형 (예: String?)을 사용하세요. 매개변수가 선택사항임을 나타내는 기본값을 설정하고 @ToolParam의 설명에 기본값을 언급합니다.

반환 유형

도구 함수의 반환 유형은 모든 Kotlin 유형일 수 있습니다. 결과는 모델로 다시 전송되기 전에 JSON 요소로 변환됩니다.

List 유형은 JSON 배열로 변환됩니다.
Map 유형은 JSON 객체로 변환됩니다.
기본 유형 (String, Number, Boolean)은 해당 JSON 기본 유형으로 변환됩니다.
다른 유형은 toString() 메서드를 사용하여 문자열로 변환됩니다.

구조화된 데이터의 경우 Map 또는 JSON 객체로 변환될 데이터 클래스를 반환하는 것이 좋습니다.

OpenAPI 사양으로 도구 정의

또는 OpenApiTool 클래스를 구현하고 도구의 설명을 Open API 사양을 준수하는 JSON 문자열로 제공하여 도구를 정의할 수 있습니다. 이 방법은 도구의 OpenAPI 스키마가 이미 있거나 도구의 정의를 세밀하게 제어해야 하는 경우에 유용합니다.

import com.google.ai.edge.litertlm.OpenApiTool

class SampleOpenApiTool : OpenApiTool {

    override fun getToolDescriptionJsonString(): String {
        return """
        {
          "name": "addition",
          "description": "Add all numbers.",
          "parameters": {
            "type": "object",
            "properties": {
              "numbers": {
                "type": "array",
                "items": {
                  "type": "number"
                }
              },
              "description": "The list of numbers to sum."
            },
            "required": [
              "numbers"
            ]
          }
        }
        """.trimIndent() // Tip: trim to save tokens
    }

    override fun execute(paramsJsonString: String): String {
        // Parse paramsJsonString with your choice of parser or deserializer and
        // execute the tool.

        // Return the result as a JSON string
        return """{"result": 1.4142}"""
    }
}

등록 도구

ConversationConfig에 도구 인스턴스를 포함합니다.

val conversation = engine.createConversation(
    ConversationConfig(
        tools = listOf(
            tool(SampleToolSet()),
            tool(SampleOpenApiTool()),
        ),
        // ... other configs
    )
)

// Send messages that might trigger the tool
conversation.sendMessageAsync("What's the weather like in London?", callback)

모델은 대화를 기반으로 도구를 호출할 시기를 결정합니다. 도구 실행 결과는 최종 대답을 생성하기 위해 모델에 자동으로 다시 전송됩니다.

수동 도구 호출

기본적으로 모델에서 생성된 도구 호출은 LiteRT-LM에 의해 자동으로 실행되며 도구 실행 결과는 다음 응답을 생성하기 위해 모델에 자동으로 다시 전송됩니다.

도구를 수동으로 실행하고 결과를 모델에 다시 전송하려면 ConversationConfig의 automaticToolCalling를 false로 설정하면 됩니다.

val conversation = engine.createConversation(
    ConversationConfig(
        tools = listOf(
            tool(SampleOpenApiTool()),
        ),
        automaticToolCalling = false,
    )
)

자동 도구 호출을 사용 중지하면 애플리케이션 코드에서 도구를 수동으로 실행하고 결과를 모델에 다시 전송해야 합니다. automaticToolCalling이 false로 설정되면 OpenApiTool의 execute 메서드가 자동으로 호출되지 않습니다.

// Send a message that triggers a tool call.
val responseMessage = conversation.sendMessage("What's the weather like in London?")

// The model returns a Message with `toolCalls` populated.
if (responseMessage.toolCalls.isNotEmpty()) {
    val toolResponses = mutableListOf<Content.ToolResponse>()
    // There can be multiple tool calls in a single response.
    for (toolCall in responseMessage.toolCalls) {
        println("Model wants to call: ${toolCall.name} with arguments: ${toolCall.arguments}")

        // Execute the tool manually with your own logic. `executeTool` is just an example here.
        val toolResponseJson = executeTool(toolCall.name, toolCall.arguments)

        // Collect tool responses.
        toolResponses.add(Content.ToolResponse(toolCall.name, toolResponseJson))
    }

    // Use Message.tool to create the tool response message.
    val toolResponseMessage = Message.tool(Contents.of(toolResponses))

    // Send the tool response message to the model.
    val finalMessage = conversation.sendMessage(toolResponseMessage)
    println("Final answer: ${finalMessage.text}") // e.g., "The weather in London is 25c."
}

예

도구 사용을 사용해 보려면 저장소를 클론하고 example/ToolMain.kt로 실행하세요.

bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:tool -- <abs_model_path>

오류 처리

API 메서드는 네이티브 레이어의 오류에 대해 LiteRtLmJniException를, 수명 주기 문제에 대해 표준 Kotlin 예외(예: IllegalStateException)를 발생시킬 수 있습니다. 항상 try-catch 블록으로 API 호출을 래핑합니다. MessageCallback의 onError 콜백은 비동기 작업 중 오류도 보고합니다.