The Google Cloud platform provides many options for deploying, serving, and fine-tuning Gemma 4 open models, including the following:
- Vertex AI Model Garden
- Cloud Run
- Google Kubernetes Engine (GKE)
- Agent Development Kit (ADK)
- Vertex AI Training Clusters (VTC)
- MaxText
- vLLM with TPUs
- Sovereign Cloud
Vertex AI Model Garden
Vertex AI is a Google Cloud platform for rapidly building and scaling machine learning projects. Gemma 4 is available in Model Garden, a curated collection of models on Vertex AI. You can test and deploy models directly from the console.
To learn more, refer to the following pages:
- Introduction to Vertex AI: Get started with Vertex AI.
- Gemma with Vertex AI: Use Gemma open models with Vertex AI.
Cloud Run
Cloud Run is a fully managed platform to run your code or containers on top of Google's highly scalable infrastructure. Deploy Gemma 4 on Cloud Run using GPUs for scale-to-zero, pay-per-use inference.
For larger mode sizes, leverage advanced configurations with RTX 6000 Pro GPUs and Model Streaming.
Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE) is a managed Kubernetes service from Google Cloud. Run Gemma 4 on GKE for enterprise-grade container orchestration. Use TPUs and GPUs to serve models with high throughput and low latency.
Agent Development Kit (ADK)
Build and orchestrate AI agents with Gemma 4 and the Agent Development Kit (ADK). Gemma 4's strong reasoning and function-calling capabilities make it ideal for agentic workflows.
Vertex AI Training Clusters (VTC)
Fine-tune Gemma 4 using Vertex AI Training Clusters (VTC). VTC provides optimized infrastructure for large-scale training and fine-tuning of open models.
vLLM with TPUs
Serve Gemma 4 on Google Cloud TPUs for state-of-the-art serving performance.
MaxText
Gemma 4 is supported in MaxText, a high-performance, arbitrary-sized JAX LLM implementation for Google Cloud TPUs.
Sovereign Cloud
Gemma 4 is available on Sovereign Cloud solutions, providing enhanced control and compliance for sensitive workloads.