Deploy GenAI Models with LiteRT

LiteRT delivers high-performance deployment for Generative AI models across mobile, desktop, and web platforms. By seamlessly leveraging hardware acceleration from CPUs, GPUs, and NPUs, LiteRT provides state-of-the-art performance for on-device GenAI inference.

You can deploy complex GenAI models using the following integrated tech stack:

  • Torch Generative API: A Python module within the AI Edge Torch Library for authoring and converting PyTorch GenAI models. It provides optimized building blocks that ensures high-performance execution on devices. See Convert PyTorch GenAI models for more details.

  • LiteRT-LM: A specialized orchestration layer built on top of LiteRT to manage LLM-specific complexities, such as session cloning, kv-cache management, prompt caching/scoring, stateful inference. See LiteRT-LM GitHub repo for more details.

  • LiteRT Converter and Runtime: The foundational engine that provides efficient model conversion, runtime execution, and optimization, empowering advanced hardware acceleration across CPU, GPU, and NPU.

LiteRT GenAI Model Zoo

LiteRT supports a growing collection of popular open-weight models on the LiteRT Hugging Face Community. These models are pre-converted and tuned for immediate deployment, enabling you to leverage peak performance on CPUs, GPUs, and NPUs out-of-the-box.

  • Gemma Family
    • Gemma 3 270M
    • Gemma 3 1B
    • Gemma 3n E2B/E4B
    • EmbeddingGemma 300M
    • Function Gemma 270M
  • Qwen Family
  • Llama
  • Phi
  • SmoLM
  • FastVLM

Featured Insights