隆重推出 Google AI Edge Portal：大规模对边缘 AI 进行基准测试。注册以在非公开预览期间申请访问权限。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM Python API

LiteRT-LM 的 Python API，适用于 Linux 和 macOS （即将支持 Windows）。支持多模态、工具使用和GPU 加速等功能。

简介

以下是使用 Python API 构建的示例终端聊天应用：

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

使用入门

LiteRT-LM 以 Python 库的形式提供。您可以从 PyPI 安装 Nightly 版本：

# Using pip
pip install litert-lm-api-nightly

# Using uv
uv pip install litert-lm-api-nightly

初始化引擎

Engine 是 API 的入口点。它负责处理模型加载和资源管理。将其用作上下文管理器（使用 with 语句）可确保及时释放原生资源。

注意：初始化引擎可能需要几秒钟才能加载模型。

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU (default) or Backend.GPU.
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

创建对话

Conversation 用于管理与模型互动时的状态和历史记录。

# Optional: Configure system instruction and initial messages
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

发送消息

您可以同步或异步（流式传输）发送消息。

同步示例：

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with full message structure
# response = conversation.send_message({"role": "user", "content": "..."})

异步（流式传输）示例：

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 新功能：多令牌预测 (MTP)

多令牌预测 (MTP) 是一种性能优化，可显著加快解码速度。对于 GPU 后端上的所有任务，我们都建议使用 MTP。

如需使用 MTP，请在初始化引擎时启用推测性解码。

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

多模态

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU,
    vision_backend=litert_lm.Backend.GPU,
) as engine:
    with engine.create_conversation() as conversation:
        user_message = {
            "role": "user",
            "content": [
                {"type": "audio", "path": "/path/to/audio.wav"},
                {"type": "text", "text": "Describe this audio."},
            ],
        }
        response = conversation.send_message(user_message)
        print(response["content"][0]["text"])

定义和使用工具

您可以将 Python 函数定义为模型可以自动调用的工具。

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

LiteRT-LM 使用函数的文档字符串和类型提示为模型生成工具架构。