Gemma 4 released with text, audio and image input and long up to 256K context window! Learn more

Deploy Gemma with Google Cloud

The Google Cloud platform provides many options for deploying, serving, and fine-tuning Gemma 4 open models, including the following:

Gemini Enterprise Agent Platform
Cloud Run
Google Kubernetes Engine (GKE)
Agent Development Kit (ADK)
Gemini Enterprise Agent Platform Training Clusters
MaxText
vLLM with TPUs
Sovereign Cloud

Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a Google Cloud platform for rapidly building and scaling machine learning projects. Gemma 4 is available in Model Garden, a curated collection of models on Gemini Enterprise Agent Platform. You can test and deploy models directly from the console.

To learn more, refer to the following pages:

Agent Platform overview: Get started with Gemini Enterprise Agent Platform.
Gemma with Gemini Enterprise Agent Platform: Use Gemma open models with Gemini Enterprise Agent Platform.

Cloud Run

Cloud Run is a fully managed platform to run your code or containers on top of Google's highly scalable infrastructure. Deploy Gemma 4 on Cloud Run using GPUs for scale-to-zero, pay-per-use inference.

For larger mode sizes, leverage advanced configurations with RTX 6000 Pro GPUs and Model Streaming.

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a managed Kubernetes service from Google Cloud. Run Gemma 4 on GKE for enterprise-grade container orchestration. Use TPUs and GPUs to serve models with high throughput and low latency.

Agent Development Kit (ADK)

Build and orchestrate AI agents with Gemma 4 and the Agent Development Kit (ADK). Gemma 4's strong reasoning and function-calling capabilities make it ideal for agentic workflows.

Gemini Enterprise Agent Platform Training Clusters

Fine-tune Gemma 4 using Gemini Enterprise Agent Platform Training Clusters. Training Clusters provides optimized infrastructure for large-scale training and fine-tuning of open models.

vLLM with TPUs

Serve Gemma 4 on Google Cloud TPUs for state-of-the-art serving performance.

MaxText

Gemma 4 is supported in MaxText, a high-performance, arbitrary-sized JAX LLM implementation for Google Cloud TPUs.

Sovereign Cloud

Gemma 4 is available on Sovereign Cloud solutions, providing enhanced control and compliance for sensitive workloads.