Stay organized with collections
Save and categorize content based on your preferences.
Google Kubernetes Engine (GKE) is
a managed Kubernetes service from Google Cloud that
you can use to deploy and operate containerized applications at scale using
Google's infrastructure. You can serve Gemma using Cloud Tensor processing units
(TPUs) and graphical processing units (GPUs) on GKE with these LLM serving
frameworks:
By serving Gemma on GKE, you can implement a robust, production-ready inference
serving solution with all the benefits of managed Kubernetes, including
efficient scalability and higher availability.
To learn more, refer to the following pages:
GKE
overview:
Get started with Google Kubernetes Engine (GKE)