You can run Gemma models completely on-device with the MediaPipe LLM Inference API. The LLM Inference API acts as a wrapper for large language models, enabling you run Gemma models on-device for common text-to-text generation tasks like information retrieval, email drafting, and document summarization.
Try the LLM Inference API with MediaPipe Studio, a web-based application for evaluating and customizing on-device models.
For more information on deploying Gemma to web browsers with the LLM Inference API, see the LLM Inference guide for Web. To learn more about the MediaPipe LLM Inference capabilities, see the LLM inference guide.