[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-01-29 (世界標準時間)。"],[],[],null,["# Deploy Gemma with Google Cloud\n\nThe Google Cloud platform provides many services for deploying and serving\nGemma open models, including the following:\n\n- [Vertex AI](#vertex_ai)\n- [Cloud Run](#run)\n- [Google Kubernetes Engine](#gke)\n- [Dataflow ML](#dataflow_ml)\n\nVertex AI\n---------\n\n[Vertex AI](https://cloud.google.com/vertex-ai) is a Google Cloud platform for\nrapidly building and scaling machine learning projects without requiring\nin-house MLOps expertise. Vertex AI provides a console where you can work with a\nlarge selection of models and offers end-to-end MLOps capabilities and a\nserverless experience for streamlined development.\n\nYou can use Vertex AI as the downstream application that serves Gemma, which is\navailable in\n[Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335),\na curated collection of models. For example, you could port weights from a Gemma\nimplementation, and use Vertex AI to serve that version of Gemma to get\npredictions.\n\nTo learn more, refer to the following pages:\n\n- [Introduction to Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform): Get started with Vertex AI.\n- [Gemma with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma): Use Gemma open models with Vertex AI.\n- [Fine-tune Gemma using KerasNLP and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb): End-to-end notebook to fine-tune Gemma using Keras.\n\nCloud Run\n---------\n\nCloud Run is a fully managed platform to run your code, function, or container\non top of Google's highly scalable infrastructure.\n\nCloud Run offers on-demand, fast starting, scale to zero, pay-per-use GPUs\nallowing you to serve open models like Gemma.\n\nTo learn more about running Gemma on Cloud Run, refer to the following pages:\n\n- [Best practices for using GPUs on Cloud Run](https://cloud.google.com/run/docs/configuring/services/gpu-best-practices)\n- [Run Gemma inference on Cloud Run GPUs with Ollama](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-ollama)\n- [Run Gemma inference on Cloud Run GPUs with vLLM](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-vllm)\n- [Run Gemma inference on Cloud Run GPUs with Transformers.js](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-transformers-js)\n\nGoogle Kubernetes Engine (GKE)\n------------------------------\n\n[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine) (GKE) is\na managed [Kubernetes](https://kubernetes.io/) service from Google Cloud that\nyou can use to deploy and operate containerized applications at scale using\nGoogle's infrastructure. You can serve Gemma using Cloud Tensor processing units\n(TPUs) and graphical processing units (GPUs) on GKE with these LLM serving\nframeworks:\n\n- [Serve Gemma using GPUs on GKE with vLLM](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm)\n- [Serve Gemma using GPUs on GKE with TGI](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi)\n- [Serve Gemma using GPUs on GKE with Triton and TensorRT-LLM](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tensortllm)\n- [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)\n\nBy serving Gemma on GKE, you can implement a robust, production-ready inference\nserving solution with all the benefits of managed Kubernetes, including\nefficient scalability and higher availability.\n\nTo learn more, refer to the following pages:\n\n- [GKE\n overview](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview): Get started with Google Kubernetes Engine (GKE)\n- [AI/ML orchestration on\n GKE](https://cloud.google.com/kubernetes-engine/docs/integrations/ai-infra): Run optimized AI/ML workloads with GKE\n\nDataflow ML\n-----------\n\n[Dataflow ML](https://cloud.google.com/dataflow/docs/machine-learning) is a\nGoogle Cloud platform for deploying and managing complete machine learning\nworkflows. With Dataflow ML, you can prepare your data for model training with\ndata processing tools, then use models like Gemma to perform local and remote\ninference with batch and streaming pipelines.\n\nYou can use Dataflow ML to seamlessly integrate Gemma into your Apache Beam\ninference pipelines with a few lines of code, enabling you to ingest data,\nverify and transform the data, feed text inputs into Gemma, and generate text\noutput.\n\nTo learn more, refer to the following pages:\n\n- [Use Gemma open models with\n Dataflow](https://cloud.google.com/dataflow/docs/machine-learning/gemma): Get started with Gemma in Dataflow.\n- [Run inference with a Gemma open\n model](https://cloud.google.com/dataflow/docs/machine-learning/gemma-run-inference): Tutorial that uses Gemma in an Apache Beam inference pipeline."]]