您現在可以使用 Gemini 2.0 Flash Experimental 生成原生圖片！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

生成圖像

Gemini API 支援使用 Gemini 2.0 Flash 實驗模型和使用 Imagen 3 產生圖片。本指南可協助您開始使用這兩種模型。

使用 Gemini 生成圖片

Gemini 2.0 Flash Experimental 支援輸出文字和內嵌圖片的功能。這可讓您使用 Gemini 以對話方式編輯圖片，或產生包含交織文字的輸出內容 (例如，在單一回合中產生包含文字和圖片的網誌文章)。所有生成的圖片都會附上 SynthID 浮水印，Google AI 工作室中的圖片也會附上可見浮水印。

以下範例說明如何使用 Gemini 2.0 產生文字和圖片輸出內容：

PythonNode.jsREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

client = genai.Client()

contents = ('Hi, can you create a 3d rendered image of a pig '
            'with wings and a top hat flying over a happy '
            'futuristic scifi city with lots of greenery?')

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image.png')
    image.show()

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function generateImage() {
  const contents = "Hi, can you create a 3d rendered image of a pig " +
                  "with wings and a top hat flying over a happy " +
                  "futuristic scifi city with lots of greenery?";

  // Set responseModalities to include "Image" so the model can generate  an image
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: {
        responseModalities: ['Text', 'Image']
    },
  });

  try {
    const response = await model.generateContent(contents);
    for (const part of  response.response.candidates[0].content.parts) {
      // Based on the part type, either show the text or save the image
      if (part.text) {
        console.log(part.text);
      } else if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        fs.writeFileSync('gemini-native-image.png', buffer);
        console.log('Image saved as gemini-native-image.png');
      }
    }
  } catch (error) {
    console.error("Error generating content:", error);
  }
}

generateImage();

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}
      ]
    }],
    "generationConfig":{"responseModalities":["Text","Image"]}
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-native-image.png

程式碼範例應輸出圖片，也可能輸出文字。

AI 生成的圖像：奇幻的飛行豬

視提示和內容而定，Gemini 會以不同模式 (文字轉圖像、文字轉圖像和文字等) 產生內容。例如：

文字轉圖像
- 提示範例：「產生艾菲爾鐵塔的圖片，背景有煙火。」
文字轉圖和文字 (交錯)
- 提示範例：「產生燉飯的圖解食譜」。
圖片和文字轉換為圖片和文字 (交錯)
- 提示範例： (含有房間內家具的圖片)「我可以換成其他顏色的沙發嗎？請更新圖片。」
圖片編輯 (文字和圖片轉圖片)
- 提示範例：「請將這張圖片編輯成卡通風格」
- 提示範例：[貓咪圖片] + [抱枕圖片] +「請在這個抱枕上製作我貓咪的十字繡圖案。」
多輪圖片編輯 (聊天)
- 提示範例：[上傳藍色汽車的圖片]「將這輛車改裝成敞篷車。」「現在將顏色改為黃色。」

使用 Gemini 編輯圖片

如要執行圖片編輯作業，請新增圖片做為輸入內容。以下範例示範如何上傳 Base64 編碼圖片。如果是多張圖片和較大的酬載，請查看「圖片輸入」一節。

PythonNode.jsREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

import PIL.Image

image = PIL.Image.open('/path/to/image.png')

client = genai.Client()

text_input = ('Hi, This is a picture of me.'
            'Can you add a llama next to me?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))
    image.show()

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function generateImage() {
    // Load the image from the local file system
    const imagePath = '/path/to/image.png';
    const imageData = fs.readFileSync(imagePath);
    const base64Image = imageData.toString('base64');

    // Prepare the content parts
    const contents = [
        { text: "Hi, This is a picture of me. Can you add a llama next to me?" },
        {
          inlineData: {
            mimeType: 'image/png',
            data: base64Image
          }
        }
      ];

  // Set responseModalities to include "Image" so the model can generate an image
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: {
        responseModalities: ['Text', 'Image']
    },
  });

  try {
    const response = await model.generateContent(contents);
    for (const part of  response.response.candidates[0].content.parts) {
      // Based on the part type, either show the text or save the image
      if (part.text) {
        console.log(part.text);
      } else if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        fs.writeFileSync('gemini-native-image.png', buffer);
        console.log('Image saved as gemini-native-image.png');
      }
    }
  } catch (error) {
    console.error("Error generating content:", error);
  }
}

generateImage();

IMG_PATH=/path/to/your/image1.jpeg

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H 'Content-Type: application/json' \
    -d "{
      \"contents\": [{
        \"parts\":[
            {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"},
            {
              \"inline_data\": {
                \"mime_type\":\"image/jpeg\",
                \"data\": \"$IMG_BASE64\"
              }
            }
        ]
      }],
      \"generationConfig\": {\"responseModalities\": [\"Text\", \"Image\"]}
    }"  \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-edited-image.png

限制

為獲得最佳成效，請使用以下語言：EN、es-MX、ja-JP、zh-CN、hi-IN。
圖片生成功能不支援音訊或影片輸入內容。
圖像生成功能可能不會在下列情況下觸發：
- 模型可能只會輸出文字。請嘗試明確要求圖片輸出內容 (例如「產生圖片」、「隨時提供圖片」、「更新圖片」)。
- 模型可能會在中途停止產生內容。請重試或改用其他提示。
為圖片產生文字時，如果先產生文字，再要求 Gemini 產生含有文字的圖片，效果會最好。

選擇模型

您應該使用哪種模型產生圖像？視您的用途而定。

Gemini 2.0 最適合用於產生符合脈絡的圖片、混合文字和圖片、納入世界知識，以及推論圖片。您可以使用這項功能，在長篇文字序列中嵌入與內容相關的準確圖像。你也可以使用自然語言，透過對話方式編輯圖片，並在整個對話過程中保留脈絡。

如果您最重視圖像品質，建議使用 Imagen 3。Imagen 3 擅長處理擬真、藝術細節，以及印象派或動漫等特定藝術風格。對於專業的圖片編輯工作，例如更新產品背景、放大圖片，以及在視覺效果中加入品牌和風格，Imagen 3 也是不錯的選擇。您可以使用 Imagen 3 製作標誌或其他品牌產品設計。

使用 Imagen 3 生成圖片

Gemini API 可讓您存取 Imagen 3，這是 Google 品質最高的文字轉圖像模型，提供多項新功能和改良功能。Imagen 3 可執行下列操作：

產生比先前模型更清晰、光線更豐富的圖像，且雜訊較少
解讀以自然語言撰寫的提示
以多種格式和風格產生圖片
比先前模型更有效率地算繪文字

Imagen 範例

本節說明如何將 Imagen 模型例項化並產生圖片。

安裝 Google Gen AI SDK 後，您可以使用以下程式碼產生圖片：

PythonREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

client = genai.Client(api_key='GEMINI_API_KEY')

response = client.models.generate_images(
    model='imagen-3.0-generate-002',
    prompt='Fuzzy bunnies in my kitchen',
    config=types.GenerateImagesConfig(
        number_of_images= 4,
    )
)
for generated_image in response.generated_images:
  image = Image.open(BytesIO(generated_image.image.image_bytes))
  image.show()

curl -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict?key=GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "instances": [
          {
            "prompt": "Fuzzy bunnies in my kitchen"
          }
        ],
        "parameters": {
          "sampleCount": 4
        }
      }'

程式碼範例應輸出類似以下的四張圖片：

AI 生成的圖像：廚房中兩隻毛茸茸的兔子

您也可以在 Gemini Cookbook 中試用 Imagen 入門筆記本。

Imagen 模型參數

generate_images() 可用的參數如下：

prompt：圖片的文字提示。
number_of_images：要產生的圖片數量，從 1 到 4 (含)。預設值為 4。
aspect_ratio：變更產生的圖片顯示比例。支援的值為 "1:1"、"3:4"、"4:3"、"9:16" 和 "16:9"。預設為 "1:1"。
safety_filter_level：為安全篩選機制新增篩選層級。有效值如下：
- "BLOCK_LOW_AND_ABOVE"：當機率分數或嚴重性分數為 LOW、MEDIUM 或 HIGH 時，就會封鎖。
- "BLOCK_MEDIUM_AND_ABOVE"：當機率分數或嚴重性分數為 MEDIUM 或 HIGH 時，就會封鎖。
- "BLOCK_ONLY_HIGH"：當機率分數或嚴重性分數為 HIGH 時，就會封鎖。
person_generation：允許模型產生人物圖片。支援的值如下：
- "DONT_ALLOW"：禁止產生人物圖片。
- "ALLOW_ADULT"：產生成人圖片，但不產生兒童圖片。此為預設值。

系統一律會在生成的圖像中加入隱形數位 SynthID 浮水印。

文字提示語言

系統支援下列輸入文字提示語言：

英文 (en)

後續步驟

如要進一步瞭解如何為 Imagen 撰寫提示，請參閱 Imagen 提示指南。
如要進一步瞭解 Gemini 2.0 模型，請參閱「Gemini 模型」和「實驗模型」。