Gemini 2.5 Pro 預先發布版現已可供正式使用！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

教學課程：開始使用 Gemini API

本教學課程說明如何為 Node.js 存取 Gemini API 開發應用程式

在本教學課程中，您將瞭解如何執行下列操作：

設定專案，包括 API 金鑰
使用純文字輸入內容來生成文字
根據文字和圖片輸入內容產生文字 (多模態)
建立多輪對話 (聊天)
透過串流加快互動速度

此外，本教學課程還包含進階用途的相關章節 (例如嵌入和計算符記) 以及控管內容生成作業。

必要條件

本教學課程假設您熟悉如何使用 Node.js。

如要完成本教學課程，請確認您的開發環境符合下列要求：

Node.js v18 以上版本
npm

設定專案

呼叫 Gemini API 前，您必須先設定專案，包括設定 API 金鑰、安裝 SDK 套件以及初始化模型。

設定 API 金鑰

如要使用 Gemini API，您必須具備 API 金鑰。如果您沒有帳戶建立金鑰

取得 API 金鑰

確保 API 金鑰安全

強烈建議您「不要」在版本中檢查 API 金鑰控制系統您應該改用密鑰儲存庫做為 API 金鑰。

本教學課程的所有程式碼片段均假設您透過環境變數

安裝 SDK 套件

如要在自己的應用程式中使用 Gemini API，請安裝 Node.js 適用的 GoogleGenerativeAI 套件：

npm install @google/generative-ai

初始化生成式模型

進行 API 呼叫前，您必須先匯入並初始化生成式模型

const { GoogleGenerativeAI } = require("@google/generative-ai");

// Access your API key as an environment variable (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

// ...

// The Gemini 1.5 models are versatile and work with most use cases
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash"});

// ...

指定模型時，請注意下列事項：

請使用符合自身用途的模型，例如 gemini-1.5-flash 多模態輸入)。本指南中將說明實作會列出每種用途的建議模型。

注意： 如要進一步瞭解可用的模型，包括功能和頻率限制，請參閱「Gemini 模型」一文。我們提供要求提高頻率限制的選項 (在如果採用預設值是不夠的。

實作常見用途

專案設定完成後，您就可以使用 Gemini API 進行以下操作：用途包括

使用純文字輸入內容來生成文字
根據文字和圖片輸入內容產生文字 (多模態)
建立多輪對話 (聊天)
透過串流加快互動速度

您可以在「進階用途」一節中找到 Gemini API 的相關資訊和嵌入。

從純文字輸入來生成文字

如果提示輸入內容僅包含文字，請使用搭載 generateContent 的 Gemini 1.5 模型來產生文字：

const { GoogleGenerativeAI } = require("@google/generative-ai");

// Access your API key as an environment variable (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

async function run() {
  // The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
  const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash"});

  const prompt = "Write a story about a magic backpack."

  const result = await model.generateContent(prompt);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

根據文字和圖片輸入內容產生文字 (多模態)

Gemini 1.5 Flash 和 1.5 Pro 可以處理多模態輸入，因此您能夠輸入文字和圖片請務必詳閱提示的圖片相關規定。

如果提示輸入內容包含文字和圖片，請使用 Gemini 1.5 模型：產生文字輸出的 generateContent 方法：

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");

// Access your API key as an environment variable (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

// Converts local file information to a GoogleGenerativeAI.Part object.
function fileToGenerativePart(path, mimeType) {
  return {
    inlineData: {
      data: Buffer.from(fs.readFileSync(path)).toString("base64"),
      mimeType
    },
  };
}

async function run() {
  // The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
  const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

  const prompt = "What's different between these pictures?";

  const imageParts = [
    fileToGenerativePart("image1.png", "image/png"),
    fileToGenerativePart("image2.jpeg", "image/jpeg"),
  ];

  const result = await model.generateContent([prompt, ...imageParts]);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

打造多輪對話 (聊天)

使用 Gemini 即可多回合，建立任意形式的對話。 SDK 會管理對話狀態，藉此簡化程序，因此使用「generateContent」時，您不必儲存對話記錄你自己。

如要建立多輪對話 (例如對話)，請使用 Gemini 1.5 模型或接著，您就能呼叫 startChat() 來初始化 Gemini 1.0 Pro 模型。接著，使用 sendMessage() 傳送新的使用者訊息，此訊息也會附加訊息和對即時通訊記錄的回應。

role 有兩種可能的選項，對話：

user：提供提示的角色。這是 sendMessage 次通話。
model：提供回應的角色。這個角色可用於使用現有的 history 呼叫 startChat()。

，瞭解如何調查及移除這項存取權。

const { GoogleGenerativeAI } = require("@google/generative-ai");

// Access your API key as an environment variable (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

async function run() {
  // The Gemini 1.5 models are versatile and work with multi-turn conversations (like chat)
  const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash"});

  const chat = model.startChat({
    history: [
      {
        role: "user",
        parts: [{ text: "Hello, I have 2 dogs in my house." }],
      },
      {
        role: "model",
        parts: [{ text: "Great to meet you. What would you like to know?" }],
      },
    ],
    generationConfig: {
      maxOutputTokens: 100,
    },
  });

  const msg = "How many paws are in my house?";

  const result = await chat.sendMessage(msg);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

run();

使用串流加快互動速度

根據預設，模型會在完成整個生成程序後傳回回應上傳資料集之後，您可以運用 AutoML 自動完成部分資料準備工作您不必等待整個並改用串流處理部分結果

以下範例顯示如何使用 generateContentStream 方法，根據文字和圖片輸入內容產生文字提示。

//...

const result = await model.generateContentStream([prompt, ...imageParts]);

let text = '';
for await (const chunk of result.stream) {
  const chunkText = chunk.text();
  console.log(chunkText);
  text += chunkText;
}

//...

在純文字輸入和聊天使用情境中，您也可以採取類似的做法。

// Use streaming with text-only input
const result = await model.generateContentStream(prompt);

如要瞭解如何建立例項，請參閱上方的即時通訊範例一個 chat。

// Use streaming with multi-turn conversations (like chat)
const result = await chat.sendMessageStream(msg);

實作進階用途

本教學課程前一節所述的常見用途您越來越習慣使用 Gemini API本節將說明可能有更進階的用途

使用嵌入

嵌入是一種呈現資訊的技術以陣列中的浮點數清單表示。Gemini 可以用來呈現文字 (字詞、句子和文字區塊) 的向量形式方便比較及對比嵌入舉例來說，假設兩則文字內容相同相似主題或情緒應有類似的嵌入透過數學比較技巧 (如餘弦相似度) 進行識別

使用embedding-001模型搭配 embedContent 方法 (或 batchEmbedContent 方法產生嵌入。以下範例來產生單一字串的嵌入

const { GoogleGenerativeAI } = require("@google/generative-ai");

// Access your API key as an environment variable (see "Set up your API key" above)
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

async function run() {
  // For embeddings, use the embedding-001 model
  const model = genAI.getGenerativeModel({ model: "embedding-001"});

  const text = "The quick brown fox jumps over the lazy dog."

  const result = await model.embedContent(text);
  const embedding = result.embedding;
  console.log(embedding.values);
}

run();

函式呼叫

函式呼叫可讓您輕鬆取得結構化資料輸出內容生成式模型您可以運用這些輸出內容呼叫其他 API，並傳回傳送給模型的相關回應資料換句話說，函式呼叫有助於必須連結生成式模型與外部系統內含最新且準確的資訊。詳情請參閱函式呼叫教學課程。

計算符記數量

使用長提示時，建議您先計算符記數量，再傳送傳回給模型的內容下列範例說明如何使用 countTokens() 用途相當廣泛

// For text-only input
const { totalTokens } = await model.countTokens(prompt);

// For text-and-image input (multimodal)
const { totalTokens } = await model.countTokens([prompt, ...imageParts]);

// For multi-turn conversations (like chat)
const history = await chat.getHistory();
const msgContent = { role: "user", parts: [{ text: msg }] };
const contents = [...history, msgContent];
const { totalTokens } = await model.countTokens({ contents });

控管內容生成功能的選項

您可以設定模型參數和安全性設定

請注意，將 generationConfig 或 safetySettings 傳遞至模型要求方法 (例如 generateContent) 會完全覆寫設定物件與 getGenerativeModel 中傳遞的名稱相同。

設定模型參數

您傳送至模型的每個提示都含有參數值，用來控制模型會產生回應模型可能會針對不同的參數值進一步瞭解模型參數：

const generationConfig = {
  stopSequences: ["red"],
  maxOutputTokens: 200,
  temperature: 0.9,
  topP: 0.1,
  topK: 16,
};

// The Gemini 1.5 models are versatile and work with most use cases
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash",  generationConfig });

使用安全性設定

您可以運用安全性設定，調整收到回應可能會被視為有害內容根據預設，安全性設定會封鎖中性內容和/或所有維度中都很有可能出現不安全的內容學習新知進一步瞭解安全性設定。

以下說明如何進行各項安全性設定：

import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";

// ...

const safetySettings = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH,
  },
];

// The Gemini 1.5 models are versatile and work with most use cases
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash", safetySettings });

你也可以設置多項安全性設定：

const safetySettings = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
];

後續步驟

「提示設計」是指建立提示來促成所需流程的程序語言模型的回應撰寫條理分明的提示相當重要能確保語言模型提供準確且高品質的回覆。瞭解撰寫提示的最佳做法。
Gemini 提供多種模型版本，可滿足不同用途的需求例如輸入類型和複雜度、即時通訊等實作對話語言工作和大小限制瞭解可用的 Gemini 模型。

，瞭解如何調查及移除這項存取權。