Google AI Edge Portal 소개: 대규모로 엣지 AI를 벤치마킹합니다. 비공개 미리보기 기간에 액세스 권한을 요청하려면 가입하세요.

LiteRT-LM 개요

LiteRT-LM은 에지 기기에서 고성능의 교차 플랫폼 LLM 배포를 제공하도록 설계된 프로덕션 지원 오픈소스 추론 프레임워크입니다.

교차 플랫폼 지원: Android, iOS, 웹, 데스크톱, IoT(예: Raspberry Pi)에서 실행합니다.
하드웨어 가속: 다양한 하드웨어에서 GPU 및 NPU 가속기를 활용하여 최고 성능과 시스템 안정성을 확보합니다.
멀티모달 기능: 비전 및 오디오 지원이 있는 LLM으로 빌드합니다.
도구 사용: 정확성 향상을 위해 디코딩이 제한된 에이전트 워크플로를 위한 함수 호출을 지원합니다.
광범위한 모델 지원: Gemma, Llama, Phi-4, Qwen 등을 실행합니다.

온디바이스 생성형 AI 쇼케이스

Google AI Edge 갤러리 스크린샷

Google AI Edge Gallery는 LiteRT-LM을 사용하여 완전히 오프라인으로 실행되는 온디바이스 생성형 AI 기능을 보여주도록 설계된 실험용 앱입니다.

Google Play: 지원되는 Android 기기에서 LLM을 로컬로 사용합니다.
App Store: iOS 기기에서 온디바이스 AI를 경험합니다.
GitHub 소스: 갤러리 앱의 소스 코드를 보고 LiteRT-LM을 자체 프로젝트 내에 통합하는 방법을 알아봅니다.

추천 모델: Gemma-4-E2B

모델 크기: 2.58GB

추가 기술 세부정보는 HuggingFace 모델 카드에 있습니다.

플랫폼 (기기)	백엔드	Prefill (tk/s)	Decode (tk/s)	첫 번째 토큰까지의 시간 (초)	최대 CPU 메모리 (MB)
Android (S26 Ultra)	CPU	557	47	1.8	1733
Android (S26 Ultra)	GPU	3808	52	0.3	676
iOS (iPhone 17 Pro)	CPU	532	25	1.9	607
iOS (iPhone 17 Pro)	GPU	2878	56	0.3	1450
Linux (Arm 2.3 및 2.8GHz, NVIDIA GeForce RTX 4090)	CPU	260	35	4	1628
Linux (Arm 2.3 및 2.8GHz, NVIDIA GeForce RTX 4090)	GPU	11234	143	0.1	913
macOS (MacBook Pro M4)	CPU	901	42	1.1	736
macOS (MacBook Pro M4)	GPU	7835	160	0.1	1623
IoT (Raspberry Pi 5 16GB)	CPU	133	8	7.8	1546

빌드 시작

다음 스니펫은 LiteRT-LM CLI와 Python, Kotlin, C++ API를 시작하는 방법을 보여줍니다.

CLI

litert-lm run model.litertlm --prompt="What is the capital of France?"

Python

engine = litert_lm.Engine("model.litertlm")

with engine.create_conversation() as conversation:
    response = conversation.send_message("What is the capital of France?")
    print(f"Response: {response['content'][0]['text']}")

Kotlin

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm",
    backend = Backend.CPU(),
)

val engine = Engine(engineConfig)
engine.initialize()

val conversation = engine.createConversation()
print(conversation.sendMessage("What is the capital of France?"))

C++

auto model_assets = ModelAssets::Create(model_path);
CHECK_OK(model_assets);

auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU);

absl::StatusOr<std::unique_ptr<Engine>> engine = Engine::CreateEngine(engine_settings);
CHECK_OK(engine);

auto conversation_config = ConversationConfig::CreateDefault(**engine);
CHECK_OK(conversation_config);
absl::StatusOr<std::unique_ptr<Conversation>> conversation = Conversation::Create(**engine, *conversation_config);
CHECK_OK(conversation);

absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", "What is the capital of France?"}
    });
CHECK_OK(model_message);

std::cout << *model_message << std::endl;

언어	상태	권장 용도	문서
CLI	🚀초기 미리보기	1분 이내에 LiteRT-LM을 시작합니다.	CLI 가이드
Python	✅ 안정적	데스크톱 및 Raspberry Pi에서 신속한 프로토타입 제작, 개발	Python 가이드
Kotlin	✅ 안정적	네이티브 Android 앱 및 JVM 기반 데스크톱 도구 코루틴에 최적화됨	Android (Kotlin) 가이드
C++	✅ 안정적	고성능의 교차 플랫폼 핵심 로직 및 임베디드 시스템	C++ 가이드
Swift	🚀개발 중	특수 Metal 지원을 통한 네이티브 iOS 및 macOS 통합	출시 예정

지원되는 백엔드 및 플랫폼

가속	Android	iOS	macOS	Windows	Linux	IoT
CPU	✅	✅	✅	✅	✅	✅
GPU	✅	✅	✅	✅	✅	-
NPU	✅	-	-	-	-	-

지원되는 모델

다음 표에는 LiteRT-LM에서 지원하는 모델이 나와 있습니다. 자세한 성능 수치와 모델 카드는 Hugging Face의 LiteRT 커뮤니티를 참고하세요.

모델	유형	크기 (MB)	세부정보	기기	CPU Prefill (tk/s)	CPU Decode (tk/s)	GPU Prefill (tk/s)	GPU Decode (tk/s)
Gemma4-E2B	채팅	2583	모델 카드	Samsung S26 Ultra	557	47	3808	52
				iPhone 17 Pro	532	25	2878	57
				MacBook Pro M4	901	42	7835	160
Gemma4-E4B	채팅	3654	모델 카드	Samsung S26 Ultra	195	18	1293	22
				iPhone 17 Pro	159	10	1189	25
				MacBook Pro M4	277	27	2560	101
Gemma-3n-E2B	채팅	2965	모델 카드	MacBook Pro M3	233	28	-	-
				Samsung S24 Ultra	111	16	816	16
Gemma-3n-E4B	채팅	4235	모델 카드	MacBook Pro M3	170	20	-	-
				Samsung S24 Ultra	74	9	548	9
Gemma3-1B	채팅	1005	모델 카드	Samsung S24 Ultra	177	33	1191	24
FunctionGemma	기본	289	모델 카드	Samsung S25 Ultra	2238	154	-	-
phi-4-mini	채팅	3906	모델 카드	Samsung S24 Ultra	67	7	314	10
Qwen2.5-1.5B	채팅	1598	모델 카드	Samsung S25 Ultra	298	34	1668	31
Qwen3-0.6B	채팅	586	모델 카드	Vivo X300 Pro	165	9	580	21
Qwen2.5-0.5B	채팅	521	모델 카드	Samsung S24 Ultra	251	30	-	-

문제 신고

버그가 발생하거나 기능 요청이 있는 경우 LiteRT-LM GitHub 문제에서 신고하세요.