Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

Deploy GenAI Models with LiteRT

LiteRT delivers high-performance deployment for Generative AI models across mobile, desktop, and web platforms. By seamlessly leveraging hardware acceleration from CPUs, GPUs, and NPUs, LiteRT provides state-of-the-art performance for on-device GenAI inference.

You can deploy complex GenAI models using the following integrated tech stack:

Torch Generative API: A Python module within the LiteRT Torch Library for authoring and converting PyTorch GenAI models. It provides optimized building blocks that ensures high-performance execution on devices. See Convert PyTorch GenAI models for more details.
LiteRT-LM: A specialized orchestration layer built on top of LiteRT to manage LLM-specific complexities, such as session cloning, kv-cache management, prompt caching/scoring, stateful inference. See LiteRT-LM GitHub repo for more details.
LiteRT Converter and Runtime: The foundational engine that provides efficient model conversion, runtime execution, and optimization, empowering advanced hardware acceleration across CPU, GPU, and NPU.

LiteRT GenAI Model Zoo

LiteRT supports a growing collection of popular open-weight models on the LiteRT Hugging Face Community. These models are pre-converted and tuned for immediate deployment, enabling you to leverage peak performance on CPUs, GPUs, and NPUs out-of-the-box.

Gemma Family
- Gemma 3 270M
- Gemma 3 1B
- Gemma 3n E2B/E4B
- EmbeddingGemma 300M: see EmbeddingGemma semantic similarity LiteRT C++ App
- Function Gemma 270M
Qwen Family
Llama
Phi
SmoLM
FastVLM

Deploy GenAI Models with LiteRT

LiteRT GenAI Model Zoo

Featured Insights