隆重推出 Google AI Edge Portal：大规模对边缘 AI 进行基准测试。注册以在非公开预览期间申请访问权限。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM CLI

借助命令行界面 (CLI)，您可以立即测试模型，无需编写任何代码。

支持的平台：

Linux
macOS
Windows
Raspberry Pi

安装

方法 1：`uvx`（推荐用于快速测试）

立即运行 litert-lm，无需永久安装。需要 uv。

您可以在任何 litert-lm 命令前添加 uvx 前缀，以便按需运行该命令：

uvx litert-lm run --help

方法 2：`uv`（持久安装）

将 litert-lm 安装为系统范围的二进制文件。需要 uv。

uv tool install litert-lm

方法 3：`pip`

在虚拟环境中进行标准安装。使用 --upgrade 可确保您获得最新版本，即使之前已安装过某个版本也是如此。

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade litert-lm

正在升级

如需将 litert-lm 升级到最新版本，请执行以下操作：

如果使用 `uvx`（方法 1）

无需执行任何操作。uvx 会自动运行最新版本。

如果使用 `uv`（方法 2）安装

uv tool upgrade litert-lm

如果通过 `pip`（方法 3）安装

激活虚拟环境并运行：

pip install --upgrade litert-lm

聊天

从 HuggingFace 下载并运行模型：

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --prompt="What is the capital of France?"

🔴 新功能：多令牌预测 (MTP)

多令牌预测 (MTP) 是一种性能优化技术，可显著提高解码速度。建议在 GPU 后端上针对所有任务普遍使用 MTP。

如需在 CLI 中启用 MTP，请使用 --enable-speculative-decoding=true 标志：

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --backend=gpu \
  --enable-speculative-decoding=true \
  --prompt="What is the capital of France?"

函数调用 / 工具

您可以使用预设运行工具。创建 preset.py：

import datetime
import base64

def get_current_time() -> str:
    """Returns the current date and time."""
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

system_instruction = "You are a helpful assistant with access to tools."
tools = [get_current_time]

使用预设值运行：

litert-lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --preset=preset.py

提示和互动式输出示例：

> what will the time be in two hours?
[tool_call] {"arguments": {}, "name": "get_current_time"}
[tool_response] {"name": "get_current_time", "response": "2026-03-25 21:54:07"}
The current time is 2026-03-25 21:54:07.

In two hours, it will be **2026-03-25 23:54:07**.

这里发生了什么？

当您提出需要外部信息（例如当前时间）的问题时，模型会识别出它需要调用工具。

模型发出 tool_call：模型输出 JSON 请求以调用 get_current_time 函数。
CLI 执行工具：LiteRT-LM CLI 会拦截此调用，并执行 preset.py 中定义的相应 Python 函数。
CLI 发送 tool_response：CLI 将结果发送回模型。
模型生成最终答案：模型使用工具响应来计算并生成用户的最终答案。

此“函数调用”循环会在 CLI 中自动进行，让您无需编写任何复杂的编排代码，即可使用 Python 功能增强本地 LLM。

Python、C++ 和 Kotlin API 也提供相同的功能。

正在卸载

如需卸载 litert-lm，请执行以下操作：

如果使用 `uvx`（方法 1）

无需执行任何操作。uvx 从临时缓存运行，不会永久安装。

如果使用 `uv`（方法 2）安装

uv tool uninstall litert-lm

如果通过 `pip`（方法 3）安装

pip uninstall litert-lm

LiteRT-LM CLI

安装

方法 1：uvx（推荐用于快速测试）

方法 2：uv（持久安装）

方法 3：pip

正在升级

如果使用 uvx（方法 1）

如果使用 uv（方法 2）安装

如果通过 pip（方法 3）安装

聊天

🔴 新功能：多令牌预测 (MTP)

函数调用 / 工具

这里发生了什么？

正在卸载

如果使用 uvx（方法 1）

如果使用 uv（方法 2）安装

如果通过 pip（方法 3）安装

方法 1：`uvx`（推荐用于快速测试）

方法 2：`uv`（持久安装）

方法 3：`pip`

如果使用 `uvx`（方法 1）

如果使用 `uv`（方法 2）安装

如果通过 `pip`（方法 3）安装

如果使用 `uvx`（方法 1）

如果使用 `uv`（方法 2）安装

如果通过 `pip`（方法 3）安装