Deploy Gemma with Google Cloud

The Google Cloud platform provides many services for deploying and serving Gemma open models, including the following:

Vertex AI

Vertex AI is a Google Cloud platform for rapidly building and scaling machine learning projects without requiring in-house MLOps expertise. Vertex AI provides a console where you can work with a large selection of models and offers end-to-end MLOps capabilities and a serverless experience for streamlined development.

You can use Vertex AI as the downstream application that serves Gemma, which is available in Model Garden, a curated collection of models. For example, you could port weights from a Gemma implementation, and use Vertex AI to serve that version of Gemma to get predictions.

To learn more, refer to the following pages:

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a managed Kubernetes service from Google Cloud that you can use to deploy and operate containerized applications at scale using Google's infrastructure. You can serve Gemma using Cloud Tensor processing units (TPUs) and graphical processing units (GPUs) on GKE with these LLM serving frameworks:

By serving Gemma on GKE, you can implement a robust, production-ready inference serving solution with all the benefits of managed Kubernetes, including efficient scalability and higher availability.

To learn more, refer to the following pages:

Dataflow ML

Dataflow ML is a Google Cloud platform for deploying and managing complete machine learning workflows. With Dataflow ML, you can prepare your data for model training with data processing tools, then use models like Gemma to perform local and remote inference with batch and streaming pipelines.

You can use Dataflow ML to seamlessly integrate Gemma into your Apache Beam inference pipelines with a few lines of code, enabling you to ingest data, verify and transform the data, feed text inputs into Gemma, and generate text output.

To learn more, refer to the following pages: