LiteRT delivers high-performance deployment for Generative AI models across mobile, desktop, and web platforms. By seamlessly leveraging hardware acceleration from CPUs, GPUs, and NPUs, LiteRT provides state-of-the-art performance for on-device GenAI inference.
You can deploy complex GenAI models using the following integrated tech stack:
Torch Generative API: A Python module within the AI Edge Torch Library for authoring and converting PyTorch GenAI models. It provides optimized building blocks that ensures high-performance execution on devices. See Convert PyTorch GenAI models for more details.
LiteRT-LM: A specialized orchestration layer built on top of LiteRT to manage LLM-specific complexities, such as session cloning, kv-cache management, prompt caching/scoring, stateful inference. See LiteRT-LM GitHub repo for more details.
LiteRT Converter and Runtime: The foundational engine that provides efficient model conversion, runtime execution, and optimization, empowering advanced hardware acceleration across CPU, GPU, and NPU.
LiteRT GenAI Model Zoo
LiteRT supports a growing collection of popular open-weight models on the LiteRT Hugging Face Community. These models are pre-converted and tuned for immediate deployment, enabling you to leverage peak performance on CPUs, GPUs, and NPUs out-of-the-box.
- Gemma Family
- Gemma 3 270M
- Gemma 3 1B
- Gemma 3n E2B/E4B
- EmbeddingGemma 300M
- Function Gemma 270M
- Qwen Family
- Llama
- Phi
- SmoLM
- FastVLM
Featured Insights
- MediaTek NPU and LiteRT: Powering the next generation of on-device AI
- Unlocking Peak Performance on Qualcomm NPU with LiteRT
- On-device GenAI in Chrome, Chromebook Plus, and Pixel Watch with LiteRT-LM
- On-device small language models with multimodality, RAG, and Function Calling
- Gemma 3 on mobile and web with Google AI Edge