Deploy Gemma with Google Cloud

The Google Cloud platform provides many options for deploying, serving, and fine-tuning Gemma 4 open models, including the following:

Vertex AI Model Garden

Vertex AI is a Google Cloud platform for rapidly building and scaling machine learning projects. Gemma 4 is available in Model Garden, a curated collection of models on Vertex AI. You can test and deploy models directly from the console.

To learn more, refer to the following pages:

Cloud Run

Cloud Run is a fully managed platform to run your code or containers on top of Google's highly scalable infrastructure. Deploy Gemma 4 on Cloud Run using GPUs for scale-to-zero, pay-per-use inference.

For larger mode sizes, leverage advanced configurations with RTX 6000 Pro GPUs and Model Streaming.

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a managed Kubernetes service from Google Cloud. Run Gemma 4 on GKE for enterprise-grade container orchestration. Use TPUs and GPUs to serve models with high throughput and low latency.

Agent Development Kit (ADK)

Build and orchestrate AI agents with Gemma 4 and the Agent Development Kit (ADK). Gemma 4's strong reasoning and function-calling capabilities make it ideal for agentic workflows.

Vertex AI Training Clusters (VTC)

Fine-tune Gemma 4 using Vertex AI Training Clusters (VTC). VTC provides optimized infrastructure for large-scale training and fine-tuning of open models.

vLLM with TPUs

Serve Gemma 4 on Google Cloud TPUs for state-of-the-art serving performance.

MaxText

Gemma 4 is supported in MaxText, a high-performance, arbitrary-sized JAX LLM implementation for Google Cloud TPUs.

Sovereign Cloud

Gemma 4 is available on Sovereign Cloud solutions, providing enhanced control and compliance for sensitive workloads.