Google Kubernetes Engine (GKE) with Gemma

Google Kubernetes Engine (GKE) is a managed Kubernetes service from Google Cloud that you can use to deploy and operate containerized applications at scale using Google's infrastructure. You can serve Gemma using Cloud Tensor processing units (TPUs) and graphical processing units (GPUs) on GKE with these LLM serving frameworks:

By serving Gemma on GKE, you can implement a robust, production-ready inference serving solution with all the benefits of managed Kubernetes, including efficient scalability and higher availability.

To learn more, refer to the following pages: