Production-ready, open-source inference framework designed to deliver high-performance, cross-platform LLM deployments on edge devices.

Spotlight

Bring state-of-the-art agentic skills to the edge with Gemma 4.

Why LiteRT-LM?

Deploy LLMs across Android, iOS, Web, and Desktop.
Maximize performance with GPU and NPU acceleration.
Support for popular LLMs as well as multi-modality (Vision, Audio) and Tool Use.

Start building

Python APIs with hardware acceleration on Linux, MacOS, Windows, and Raspberry Pi.
Native Android apps and JVM-based desktop tools.
Native iOS and macOS integration with specialized Metal support (Swift APIs coming soon).
x-platform C++ APIs .

Join the Community

Contribute to the open-source project, report issues, and see examples.
Download pre-converted models (Gemma, Qwen and more), and join the discussion.

Blogs and Announcements

Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and reach using LiteRT-LM.
Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale.
Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs.
Latest insights on RAG, multimodality, and function calling for edge language models.