Vertex AI는 사내 MLOps 전문 지식 없이도 머신러닝 프로젝트를 빠르게 빌드하고 확장할 수 있는 Google Cloud 플랫폼입니다. Vertex AI는 다양한 모델을 사용할 수 있는 콘솔을 제공하고 엔드 투 엔드 MLOps 기능과 간소화된 개발을 위한 서버리스 환경을 제공합니다.
Vertex AI를 선별된 모델 모음인 Model Garden에서 사용할 수 있는 Gemma를 제공하는 다운스트림 애플리케이션으로 사용할 수 있습니다. 예를 들어 Gemma 구현에서 가중치를 포팅하고 Vertex AI를 사용하여 해당 버전의 Gemma를 서빙하여 예측을 얻을 수 있습니다.
Google Kubernetes Engine (GKE)은 Google 인프라를 사용하여 컨테이너화된 애플리케이션을 대규모로 배포하고 운영하는 데 사용할 수 있는 Google Cloud의 관리형 Kubernetes 서비스입니다. 다음과 같은 LLM 제공 프레임워크를 사용하여 GKE에서 Cloud Tensor Processing Unit(TPU) 및 그래픽 처리 장치 (GPU)를 사용하여 Gemma를 제공할 수 있습니다.
Dataflow ML은 전체 머신러닝 워크플로를 배포하고 관리하기 위한 Google Cloud 플랫폼입니다. Dataflow ML을 사용하면 데이터 처리 도구로 모델 학습용 데이터를 준비한 다음 Gemma와 같은 모델을 사용하여 일괄 및 스트리밍 파이프라인으로 로컬 및 원격 추론을 실행할 수 있습니다.
Dataflow ML을 사용하면 몇 줄의 코드로 Gemma를 Apache Beam 추론 파이프라인에 원활하게 통합하여 데이터를 처리하고, 데이터를 확인 및 변환하고, 텍스트 입력을 Gemma에 제공하고, 텍스트 출력을 생성할 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["필요한 정보가 없음","missingTheInformationINeed","thumb-down"],["너무 복잡함/단계 수가 너무 많음","tooComplicatedTooManySteps","thumb-down"],["오래됨","outOfDate","thumb-down"],["번역 문제","translationIssue","thumb-down"],["샘플/코드 문제","samplesCodeIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-01-29(UTC)"],[],[],null,["# Deploy Gemma with Google Cloud\n\nThe Google Cloud platform provides many services for deploying and serving\nGemma open models, including the following:\n\n- [Vertex AI](#vertex_ai)\n- [Cloud Run](#run)\n- [Google Kubernetes Engine](#gke)\n- [Dataflow ML](#dataflow_ml)\n\nVertex AI\n---------\n\n[Vertex AI](https://cloud.google.com/vertex-ai) is a Google Cloud platform for\nrapidly building and scaling machine learning projects without requiring\nin-house MLOps expertise. Vertex AI provides a console where you can work with a\nlarge selection of models and offers end-to-end MLOps capabilities and a\nserverless experience for streamlined development.\n\nYou can use Vertex AI as the downstream application that serves Gemma, which is\navailable in\n[Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335),\na curated collection of models. For example, you could port weights from a Gemma\nimplementation, and use Vertex AI to serve that version of Gemma to get\npredictions.\n\nTo learn more, refer to the following pages:\n\n- [Introduction to Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform): Get started with Vertex AI.\n- [Gemma with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma): Use Gemma open models with Vertex AI.\n- [Fine-tune Gemma using KerasNLP and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb): End-to-end notebook to fine-tune Gemma using Keras.\n\nCloud Run\n---------\n\nCloud Run is a fully managed platform to run your code, function, or container\non top of Google's highly scalable infrastructure.\n\nCloud Run offers on-demand, fast starting, scale to zero, pay-per-use GPUs\nallowing you to serve open models like Gemma.\n\nTo learn more about running Gemma on Cloud Run, refer to the following pages:\n\n- [Best practices for using GPUs on Cloud Run](https://cloud.google.com/run/docs/configuring/services/gpu-best-practices)\n- [Run Gemma inference on Cloud Run GPUs with Ollama](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-ollama)\n- [Run Gemma inference on Cloud Run GPUs with vLLM](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-vllm)\n- [Run Gemma inference on Cloud Run GPUs with Transformers.js](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-transformers-js)\n\nGoogle Kubernetes Engine (GKE)\n------------------------------\n\n[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine) (GKE) is\na managed [Kubernetes](https://kubernetes.io/) service from Google Cloud that\nyou can use to deploy and operate containerized applications at scale using\nGoogle's infrastructure. You can serve Gemma using Cloud Tensor processing units\n(TPUs) and graphical processing units (GPUs) on GKE with these LLM serving\nframeworks:\n\n- [Serve Gemma using GPUs on GKE with vLLM](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm)\n- [Serve Gemma using GPUs on GKE with TGI](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi)\n- [Serve Gemma using GPUs on GKE with Triton and TensorRT-LLM](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tensortllm)\n- [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)\n\nBy serving Gemma on GKE, you can implement a robust, production-ready inference\nserving solution with all the benefits of managed Kubernetes, including\nefficient scalability and higher availability.\n\nTo learn more, refer to the following pages:\n\n- [GKE\n overview](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview): Get started with Google Kubernetes Engine (GKE)\n- [AI/ML orchestration on\n GKE](https://cloud.google.com/kubernetes-engine/docs/integrations/ai-infra): Run optimized AI/ML workloads with GKE\n\nDataflow ML\n-----------\n\n[Dataflow ML](https://cloud.google.com/dataflow/docs/machine-learning) is a\nGoogle Cloud platform for deploying and managing complete machine learning\nworkflows. With Dataflow ML, you can prepare your data for model training with\ndata processing tools, then use models like Gemma to perform local and remote\ninference with batch and streaming pipelines.\n\nYou can use Dataflow ML to seamlessly integrate Gemma into your Apache Beam\ninference pipelines with a few lines of code, enabling you to ingest data,\nverify and transform the data, feed text inputs into Gemma, and generate text\noutput.\n\nTo learn more, refer to the following pages:\n\n- [Use Gemma open models with\n Dataflow](https://cloud.google.com/dataflow/docs/machine-learning/gemma): Get started with Gemma in Dataflow.\n- [Run inference with a Gemma open\n model](https://cloud.google.com/dataflow/docs/machine-learning/gemma-run-inference): Tutorial that uses Gemma in an Apache Beam inference pipeline."]]