生成圖像

Gemini API 支援使用 Gemini 2.0 Flash Experimental 和 Imagen 3 產生圖片。本指南可協助您開始使用這兩種模型。

使用 Gemini 生成圖片

Gemini 2.0 Flash Experimental 支援輸出文字和內嵌圖片的功能。這可讓您使用 Gemini 以對話方式編輯圖片，或產生包含交織文字的輸出內容 (例如，在單一回合中產生包含文字和圖片的網誌文章)。所有生成的圖片都會附上 SynthID 浮水印，Google AI 工作室中的圖片也會附上可見浮水印。

以下範例說明如何使用 Gemini 2.0 產生文字和圖片輸出內容：

PythonJavaScriptREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

client = genai.Client()

contents = ('Hi, can you create a 3d rendered image of a pig '
            'with wings and a top hat flying over a happy '
            'futuristic scifi city with lots of greenery?')

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image.png')
    image.show()

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

async function main() {

  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });

  const contents =
    "Hi, can you create a 3d rendered image of a pig " +
    "with wings and a top hat flying over a happy " +
    "futuristic scifi city with lots of greenery?";

  // Set responseModalities to include "Image" so the model can generate  an image
  const response = await ai.models.generateContent({
    model: "gemini-2.0-flash-exp-image-generation",
    contents: contents,
    config: {
      responseModalities: ["Text", "Image"],
    },
  });
  for (const part of response.candidates[0].content.parts) {
    // Based on the part type, either show the text or save the image
    if (part.text) {
      console.log(part.text);
    } else if (part.inlineData) {
      const imageData = part.inlineData.data;
      const buffer = Buffer.from(imageData, "base64");
      fs.writeFileSync("gemini-native-image.png", buffer);
      console.log("Image saved as gemini-native-image.png");
    }
  }
}

main();

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}
      ]
    }],
    "generationConfig":{"responseModalities":["Text","Image"]}
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-native-image.png

視提示和內容而定，Gemini 會以不同模式 (文字轉圖像、文字轉圖像和文字等) 產生內容。例如：

文字轉圖像
- 提示範例：「產生艾菲爾鐵塔的圖片，背景有煙火。」
文字轉圖和文字 (交錯)
- 提示範例：「產生燉飯的圖解食譜」。
圖片和文字轉換為圖片和文字 (交錯)
- 提示範例： (含有房間內家具的圖片)「我房間內的沙發適合哪些顏色？你能更新圖片嗎？」
圖片編輯 (文字和圖片轉圖片)
- 提示範例：「編輯這張圖片，讓圖片看起來像卡通」
- 提示範例：[貓咪圖片] + [抱枕圖片] +「請在這個抱枕上製作我貓咪的十字繡圖案。」
多輪圖片編輯 (聊天)
- 提示範例：[上傳藍色汽車的圖片]「將這輛車改裝成敞篷車。」「現在將顏色改為黃色。」

使用 Gemini 編輯圖片

如要執行圖片編輯作業，請新增圖片做為輸入內容。以下範例示範如何上傳 Base64 編碼圖片。如果是多張圖片和較大的酬載，請參閱「圖片輸入」一節。

PythonJavaScriptREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

import PIL.Image

image = PIL.Image.open('/path/to/image.png')

client = genai.Client()

text_input = ('Hi, This is a picture of me.'
            'Can you add a llama next to me?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))
    image.show()

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

async function main() {

  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });

  // Load the image from the local file system
  const imagePath = "path/to/image.png";
  const imageData = fs.readFileSync(imagePath);
  const base64Image = imageData.toString("base64");

  // Prepare the content parts
  const contents = [
    { text: "Can you add a llama next to the image?" },
    {
      inlineData: {
        mimeType: "image/png",
        data: base64Image,
      },
    },
  ];

  // Set responseModalities to include "Image" so the model can generate an image
  const response = await ai.models.generateContent({
    model: "gemini-2.0-flash-exp-image-generation",
    contents: contents,
    config: {
      responseModalities: ["Text", "Image"],
    },
  });
  for (const part of response.candidates[0].content.parts) {
    // Based on the part type, either show the text or save the image
    if (part.text) {
      console.log(part.text);
    } else if (part.inlineData) {
      const imageData = part.inlineData.data;
      const buffer = Buffer.from(imageData, "base64");
      fs.writeFileSync("gemini-native-image.png", buffer);
      console.log("Image saved as gemini-native-image.png");
    }
  }
}

main();

IMG_PATH=/path/to/your/image1.jpeg

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H 'Content-Type: application/json' \
    -d "{
      \"contents\": [{
        \"parts\":[
            {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"},
            {
              \"inline_data\": {
                \"mime_type\":\"image/jpeg\",
                \"data\": \"$IMG_BASE64\"
              }
            }
        ]
      }],
      \"generationConfig\": {\"responseModalities\": [\"Text\", \"Image\"]}
    }"  \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-edited-image.png

限制

為獲得最佳成效，請使用以下語言：EN、es-MX、ja-JP、zh-CN、hi-IN。
圖像生成功能不支援音訊或影片輸入內容。
圖像生成功能不一定會觸發：
- 模型可能只會輸出文字。請嘗試明確要求圖片輸出內容 (例如「產生圖片」、「隨時提供圖片」、「更新圖片」)。
- 模型可能會在中途停止產生內容。請重試或改用其他提示。
為圖片產生文字時，如果先產生文字，然後要求 Gemini 產生含有文字的圖片，效果會最好。

選擇模型

您應該使用哪種模型產生圖像？視您的用途而定。

Gemini 2.0 最適合用於產生符合脈絡的圖片、混合文字和圖片、納入世界知識，以及推論圖片。您可以使用這項功能，在長篇文字序列中嵌入準確且與內容相關的圖像。你也可以使用自然語言，透過對話方式編輯圖片，同時在整個對話過程中維持脈絡。

如果您最重視圖像品質，建議使用 Imagen 3。Imagen 3 擅長處理擬真、藝術細節，以及印象派或動漫等特定藝術風格。對於專業的圖片編輯工作，例如更新產品背景、放大圖片，以及在視覺效果中加入品牌和風格，Imagen 3 也是不錯的選擇。您可以使用 Imagen 3 製作標誌或其他品牌產品設計。

使用 Imagen 3 生成圖片

Gemini API 可讓您存取 Imagen 3，這是 Google 品質最高的文字轉圖像模型，提供多項新功能和改良功能。Imagen 3 可執行下列操作：

產生比先前模型更清晰、光線更豐富的圖像，且雜訊較少
解讀以自然語言撰寫的提示
以多種格式和風格產生圖片
比先前模型更有效率地算繪文字

PythonJavaScriptREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

client = genai.Client(api_key='GEMINI_API_KEY')

response = client.models.generate_images(
    model='imagen-3.0-generate-002',
    prompt='Robot holding a red skateboard',
    config=types.GenerateImagesConfig(
        number_of_images= 4,
    )
)
for generated_image in response.generated_images:
  image = Image.open(BytesIO(generated_image.image.image_bytes))
  image.show()

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

async function main() {

  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });

  const response = await ai.models.generateImages({
    model: 'imagen-3.0-generate-002',
    prompt: 'Robot holding a red skateboard',
    config: {
      numberOfImages: 4,
    },
  });

  let idx = 1;
  for (const generatedImage of response.generatedImages) {
    let imgBytes = generatedImage.image.imageBytes;
    const buffer = Buffer.from(imgBytes, "base64");
    fs.writeFileSync(`imagen-${idx}.png`, buffer);
    idx++;
  }
}

main();

curl -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict?key=GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "instances": [
          {
            "prompt": "Robot holding a red skateboard"
          }
        ],
        "parameters": {
          "sampleCount": 4
        }
      }'

目前 Imagen 僅支援英文提示，以及下列參數：

Imagen 模型參數

(命名慣例依程式語言而異)。

numberOfImages：要產生的圖片數量，從 1 到 4 (含)。預設值為 4。
aspectRatio：變更產生圖片的顯示比例。支援的值為 "1:1"、"3:4"、"4:3"、"9:16" 和 "16:9"。預設為 "1:1"。
personGeneration：允許模型產生人物圖片。支援的值如下：
- "DONT_ALLOW"：禁止產生人物圖片。
- "ALLOW_ADULT"：產生成人圖片，但不產生兒童圖片。此為預設值。

後續步驟

如要進一步瞭解如何為 Imagen 撰寫提示，請參閱 Imagen 提示指南。
如要進一步瞭解 Gemini 2.0 模型，請參閱「Gemini 模型」和「實驗模型」。