This guide shows how to deploy Gemma 3 open models on a
Cloud Run with a
single click in
Google AI Studio.
Google AI Studio is a browser-based platform that lets you quickly try out
models and experiment with different prompts. After you've entered a chat prompt
to design a prototype web app that uses the selected Gemma 3 model, you
can select Deploy to Cloud Run to run the Gemma model
on a GPU-enabled Cloud Run service.
By using Google AI Studio to deploy a generated front-end service to
Cloud Run, you skip most of the setup steps of preparing a container
since Cloud Run provides a
prebuilt container
for serving Gemma open models on Cloud Run that supports the Google Gen AI SDK.
Get started with Google AI Studio
This section guides you through deploying Gemma 3 to Cloud Run using Google AI
Studio.
In the Run settings panel on the Chat page, use the default
Gemma model, or select one of the Gemma models.
In the top bar, select View more actions and click
Deploy to Cloud Run.
In the Deploy Gemma 3 on Google Cloud Run
dialog, follow the prompts to create a new Google Cloud project, or select
an existing project. You might be prompted to enable billing if there is no
associated billing account.
After Google AI Studio verifies your project, click
Deploy to Google Cloud.
After the Gemma 3 model has successfully deployed to
Google Cloud, the dialog displays the following:
A Cloud Run endpoint URL of your Cloud Run service running Gemma 3 and
Ollama.
A link to the Cloud Run service in the Google Cloud console. To learn
about the default configuration settings for your Cloud Run service, go to
the link, then select Edit & deploy new revision to view or modify the
configuration settings.
To view the Gemini API sample code that was used to create the
Cloud Run service, select Get Code.
Optional: Copy the code and make modifications as needed.
With your code, you can use the deployed Cloud Run endpoint and API key with the
Google Gen AI SDK.
fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptions# Configure the client to use your Cloud Run endpoint and API keyclient=genai.Client(api_key="<YOUR_API_KEY>",http_options=HttpOptions(base_url="<cloud_run_url>"))# Example: Generate content (non-streaming)response=client.models.generate_content(model="<model>",# Replace model with the Gemma 3 model you selected in Google AI Studio, such as "gemma-3-1b-it".contents=["How does AI work?"])print(response.text)# Example: Stream generate contentresponse=client.models.generate_content_stream(model="<model>",# Replace model with the Gemma 3 model you selected in Google AI Studio, such as "gemma-3-1b-it".contents=["Write a story about a magic backpack. You are the narrator of an interactive text adventure game."])forchunkinresponse:print(chunk.text,end="")
Considerations
When you deploy a Cloud Run service from Google AI Studio, consider the
following:
Pricing: Cloud Run is a billable
component. To generate a cost estimate based on your projected usage, use the
pricing calculator.
Quota: Cloud Run automatically makes the request for
Request Total Nvidia L4 GPU allocation, per project per region quota under
the Cloud Run Admin API.
Permissions: If you need to modify your Cloud Run service, you must have
the required IAM roles
granted to your account on your project.
Authentication: By default, when you deploy a Cloud Run service from
Google AI Studio, the service is deployed with
public (unauthenticated) access
(--allow-unauthenticated flag). To use a stronger security mechanism, we
recommend that you
authenticate with IAM.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-05-23 UTC."],[],[],null,["# Deploy Gemma 3 to Cloud Run with Google AI Studio\n\nThis guide shows how to deploy Gemma 3 open models on a\n[Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) with a\nsingle click in\n[Google AI Studio](https://ai.google.dev/gemini-api/docs/ai-studio-quickstart).\n\nGoogle AI Studio is a browser-based platform that lets you quickly try out\nmodels and experiment with different prompts. After you've entered a chat prompt\nto design a prototype web app that uses the selected Gemma 3 model, you\ncan select **Deploy to Cloud Run** to run the Gemma model\non a [GPU-enabled Cloud Run service](https://cloud.google.com/run/docs/overview/what-is-cloud-run).\n\nBy using Google AI Studio to deploy a generated front-end service to\nCloud Run, you skip most of the setup steps of preparing a container\nsince Cloud Run provides a\n[prebuilt container](https://github.com/google-gemini/gemma-cookbook/tree/main/Demos/Gemma-on-Cloudrun)\nfor serving Gemma open models on Cloud Run that supports the Google Gen AI SDK.\n\nGet started with Google AI Studio\n---------------------------------\n\nThis section guides you through deploying Gemma 3 to Cloud Run using Google AI\nStudio.\n\n1. Select a Gemma model in Google AI Studio.\n\n [Go to Google AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemma-3-1b-it)\n\n In the **Run settings** panel on the **Chat** page, use the default\n Gemma model, or select one of the Gemma models.\n2. In the top bar, select **View more actions** and click\n **Deploy to Cloud Run**.\n\n3. In the **Deploy Gemma 3 on Google Cloud Run**\n dialog, follow the prompts to create a new Google Cloud project, or select\n an existing project. You might be prompted to enable billing if there is no\n associated billing account.\n\n4. After Google AI Studio verifies your project, click\n **Deploy to Google Cloud**.\n\n5. After the Gemma 3 model has successfully deployed to\n Google Cloud, the dialog displays the following:\n\n - A Cloud Run endpoint URL of your Cloud Run service running Gemma 3 and Ollama.\n - A generated API Key that is used for authentication with the [Gemini API libraries](https://ai.google.dev/gemini-api/docs/libraries). This key is configured as an [environment variable](https://cloud.google.com/run/docs/configuring/services/environment-variables) of the deployed Cloud Run service to authorize incoming requests. We recommend that you modify the API key to use IAM authentication. For more details, see [Securely interact with the Google Gen AI SDK](https://cloud.google.com/run/docs/run-gemma-on-cloud-run#securely-interact-with-gen-ai-sdk).\n - A link to the Cloud Run service in the Google Cloud console. To learn about the default configuration settings for your Cloud Run service, go to the link, then select **Edit \\& deploy new revision** to view or modify the configuration settings.\n6. To view the Gemini API sample code that was used to create the\n Cloud Run service, select **Get Code**.\n\n7. Optional: Copy the code and make modifications as needed.\n\nWith your code, you can use the deployed Cloud Run endpoint and API key with the\n[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/libraries).\n\nFor example, if you are using the\n[Google Gen AI SDK for Python](https://ai.google.dev/gemini-api/docs/libraries),\nthe Python code might look as follows: \n\n from google import genai\n from google.genai.types import HttpOptions\n\n # Configure the client to use your Cloud Run endpoint and API key\n client = genai.Client(api_key=\"\u003cYOUR_API_KEY\u003e\", http_options=HttpOptions(base_url=\"\u003ccloud_run_url\u003e\"))\n\n\n # Example: Generate content (non-streaming)\n response = client.models.generate_content(\n model=\"\u003cmodel\u003e\", # Replace model with the Gemma 3 model you selected in Google AI Studio, such as \"gemma-3-1b-it\".\n contents=[\"How does AI work?\"]\n )\n print(response.text)\n\n\n # Example: Stream generate content\n response = client.models.generate_content_stream(\n model=\"\u003cmodel\u003e\", # Replace model with the Gemma 3 model you selected in Google AI Studio, such as \"gemma-3-1b-it\".\n contents=[\"Write a story about a magic backpack. You are the narrator of an interactive text adventure game.\"]\n )\n for chunk in response:\n print(chunk.text, end=\"\")\n\nConsiderations\n--------------\n\nWhen you deploy a Cloud Run service from Google AI Studio, consider the\nfollowing:\n\n- **Pricing** : [Cloud Run](https://cloud.google.com/run/pricing) is a billable component. To generate a cost estimate based on your projected usage, use the [pricing calculator](https://cloud.google.com/products/calculator).\n- **Quota** : Cloud Run automatically makes the request for `Request Total Nvidia L4 GPU allocation, per project per region` quota under the Cloud Run Admin API.\n- **App Proxy Server** : The deployed service uses the [Google AI Studio Gemini App Proxy Server](https://github.com/google-gemini/aistudio-showcase/tree/main/src/server_proxy) to wrap Ollama and make your service compatible with the Gemini API.\n- **Permissions** : If you need to modify your Cloud Run service, you must have the [required IAM roles](https://cloud.google.com/run/docs/run-gemma-on-cloud-run#before-you-begin) granted to your account on your project.\n- **Authentication** : By default, when you deploy a Cloud Run service from Google AI Studio, the service is deployed with [public (unauthenticated) access](https://cloud.google.com/run/docs/authenticating/public) (`--allow-unauthenticated` flag). To use a stronger security mechanism, we recommend that you [authenticate with IAM](https://cloud.google.com/run/docs/run-gemma-on-cloud-run#securely-interact-with-gen-ai-sdk).\n\nWhat's next\n-----------\n\nLearn about best practices for securing and optimizing performance when you\n[deploy to Cloud Run from Google AI Studio](https://cloud.google.com/run/docs/run-gemma-on-cloud-run)."]]