Google AI Edge Portal 隆重推出：大規模基準測試 Edge AI。申請在非公開預先發布版期間要求存取權。

LiteRT-LM 總覽

LiteRT-LM 是可部署於正式環境的開放原始碼推論框架，專為在邊緣裝置上部署高效能的跨平台 LLM 而設計。

跨平台支援：可在 Android、iOS、網頁、電腦和物聯網 (例如 Raspberry Pi) 上執行。
硬體加速：在各種硬體上運用 GPU 和 NPU 加速器，達到最高效能並確保系統穩定。
多模態：使用支援視覺和音訊的 LLM 建構。
工具使用：支援代理功能工作流程的函式呼叫，並提供受限解碼功能，提升準確度。
支援多種模型：執行 Gemma、Llama、Phi-4、Qwen 等模型。

裝置端生成式 AI 展示

Google AI Edge Gallery 螢幕截圖

Google AI Edge Gallery 是一項實驗性應用程式，旨在展示完全離線執行的裝置端生成式 AI 功能，使用的技術為 LiteRT-LM。

Google Play：在支援的 Android 裝置上使用本機 LLM。
App Store：在 iOS 裝置上體驗裝置端 AI。
GitHub 來源：查看相片庫應用程式的原始碼，瞭解如何在自己的專案中整合 LiteRT-LM。

精選模型：Gemma-4-E2B

模型大小：2.58 GB

如需其他技術詳細資料，請參閱 HuggingFace 模型資訊卡

平台 (裝置)	後端	預填 (tk/s)	解碼 (tk/s)	第一個詞元生成時間 (秒)	CPU 記憶體用量上限 (MB)
Android (S26 Ultra)	CPU	557	47	1.8	1733
Android (S26 Ultra)	GPU	3808	52	0.3	676
iOS (iPhone 17 Pro)	CPU	532	25	1.9	607
iOS (iPhone 17 Pro)	GPU	2878	56	0.3	1450
Linux (Arm 2.3 和 2.8 GHz、NVIDIA GeForce RTX 4090)	CPU	260	35	4	1628
Linux (Arm 2.3 和 2.8 GHz、NVIDIA GeForce RTX 4090)	GPU	11234	143	0.1	913
macOS (MacBook Pro M4)	CPU	901	42	1.1	736
macOS (MacBook Pro M4)	GPU	7835	160	0.1	1623
IoT (Raspberry Pi 5 16GB)	CPU	133	8	7.8	1546

開始建構

下列程式碼片段說明如何開始使用 LiteRT-LM CLI，以及 Python、Kotlin 和 C++ API。

CLI

litert-lm run model.litertlm --prompt="What is the capital of France?"

Python

engine = litert_lm.Engine("model.litertlm")

with engine.create_conversation() as conversation:
    response = conversation.send_message("What is the capital of France?")
    print(f"Response: {response['content'][0]['text']}")

Kotlin

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm",
    backend = Backend.CPU(),
)

val engine = Engine(engineConfig)
engine.initialize()

val conversation = engine.createConversation()
print(conversation.sendMessage("What is the capital of France?"))

C++

auto model_assets = ModelAssets::Create(model_path);
CHECK_OK(model_assets);

auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU);

absl::StatusOr<std::unique_ptr<Engine>> engine = Engine::CreateEngine(engine_settings);
CHECK_OK(engine);

auto conversation_config = ConversationConfig::CreateDefault(**engine);
CHECK_OK(conversation_config);
absl::StatusOr<std::unique_ptr<Conversation>> conversation = Conversation::Create(**engine, *conversation_config);
CHECK_OK(conversation);

absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", "What is the capital of France?"}
    });
CHECK_OK(model_message);

std::cout << *model_message << std::endl;

語言	狀態	適用情境	說明文件
CLI	🚀 早期預先發布版	在 1 分鐘內開始使用 LiteRT-LM。	CLI 指南
Python	✅ 穩定版	在電腦和 Raspberry Pi 上快速製作原型、開發。	Python 指南
Kotlin	✅ 穩定版	原生 Android 應用程式和以 JVM 為基礎的電腦工具。針對協同程式最佳化。	Android (Kotlin) 指南
C++	✅ 穩定版	高效能的跨平台核心邏輯和嵌入式系統。	C++ 指南
Swift	🚀 開發中	原生整合 iOS 和 macOS，並提供專用的 Metal 支援。	即將推出

支援的後端和平台

加速	Android	iOS	macOS	Windows	Linux	IoT
CPU	✅	✅	✅	✅	✅	✅
GPU	✅	✅	✅	✅	✅	-
NPU	✅	-	-	-	-	-

支援的機型

下表列出 LiteRT-LM 支援的模型。如要查看更詳細的效能數據和模型資訊卡，請前往 Hugging Face 上的 LiteRT 社群。

型號	類型	大小 (MB)	詳細資料	裝置	CPU 預先填入 (tk/s)	CPU 解碼 (tk/s)	GPU 預填 (每秒權杖數)	GPU 解碼 (每秒影格數)
Gemma4-E2B	即時通訊	2583	Model Card	Samsung S26 Ultra	557	47	3808	52
				iPhone 17 Pro	532	25	2878	57
				MacBook Pro M4	901	42	7835	160
Gemma4-E4B	即時通訊	3654	Model Card	Samsung S26 Ultra	195	18	1293	22
				iPhone 17 Pro	159	10	1189	25
				MacBook Pro M4	277	27	2560	101
Gemma-3n-E2B	即時通訊	2965	Model Card	MacBook Pro M3	233	28	-	-
				Samsung S24 Ultra	111	16	816	16
Gemma-3n-E4B	即時通訊	4235	Model Card	MacBook Pro M3	170	20	-	-
				Samsung S24 Ultra	74	9	548	9
Gemma3-1B	即時通訊	1005	Model Card	Samsung S24 Ultra	177	33	1191	24
FunctionGemma	Base	289	Model Card	Samsung S25 Ultra	2238	154	-	-
phi-4-mini	即時通訊	3906	Model Card	Samsung S24 Ultra	67	7	314	10
Qwen2.5-1.5B	即時通訊	(#1598)	Model Card	Samsung S25 Ultra	298	34	1668	31
Qwen3-0.6B	即時通訊	586	Model Card	Vivo X300 Pro	165	9	580	21
Qwen2.5-0.5B	即時通訊	521	Model Card	Samsung S24 Ultra	251	30	-	-

報表問題

如果遇到錯誤或有功能要求，請在 LiteRT-LM GitHub Issues 回報。