画像を生成する

Gemini API は、Gemini 2.0 Flash 試験運用版と Imagen 3 を使用した画像生成をサポートしています。このガイドでは、両方のモデルの使用を開始する方法について説明します。

Gemini を使用して画像を生成する

Gemini 2.0 Flash Experimental は、テキストとインライン画像を出力する機能をサポートしています。これにより、Gemini を使用した対話的な画像の編集や、テキストが織り込まれた出力の生成を行えます（1 つのターンにテキストと画像を含むブログ投稿の生成など）。生成されたすべての画像には SynthID の透かしが含まれ、Google AI Studio の画像には目に見えない透かしも含まれます。

次の例は、Gemini 2.0 を使用してテキストと画像の出力を生成する方法を示しています。

PythonNode.jsREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

client = genai.Client()

contents = ('Hi, can you create a 3d rendered image of a pig '
            'with wings and a top hat flying over a happy '
            'futuristic scifi city with lots of greenery?')

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image.png')
    image.show()

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function generateImage() {
  const contents = "Hi, can you create a 3d rendered image of a pig " +
                  "with wings and a top hat flying over a happy " +
                  "futuristic scifi city with lots of greenery?";

  // Set responseModalities to include "Image" so the model can generate  an image
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: {
        responseModalities: ['Text', 'Image']
    },
  });

  try {
    const response = await model.generateContent(contents);
    for (const part of  response.response.candidates[0].content.parts) {
      // Based on the part type, either show the text or save the image
      if (part.text) {
        console.log(part.text);
      } else if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        fs.writeFileSync('gemini-native-image.png', buffer);
        console.log('Image saved as gemini-native-image.png');
      }
    }
  } catch (error) {
    console.error("Error generating content:", error);
  }
}

generateImage();

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}
      ]
    }],
    "generationConfig":{"responseModalities":["Text","Image"]}
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-native-image.png

Gemini は、プロンプトとコンテキストに応じて、さまざまなモード（テキストから画像、テキストから画像とテキストなど）でコンテンツを生成します。次に例を示します。

テキスト画像変換
- プロンプトの例: 「背景に花火があるエッフェル塔の画像を生成してください」。
テキスト画像変換とテキスト（インターリーブ）
- プロンプトの例: 「パエリアのレシピをイラスト付きで生成してください。」
画像とテキスト画像変換とテキスト（インターリーブ）
- プロンプトの例:（家具付きの部屋の画像を提示して）「この部屋に合いそうなソファの色には他にどんなものがありますか？画像を更新してください」。
画像編集（テキストと画像による画像変換）
- プロンプトの例: 「この画像を編集してカートゥーンのようにしてください」
- プロンプトの例: [猫の画像] + [枕の画像] + 「この枕に猫のクロスステッチを作成してください」。
マルチターン画像編集（チャット）
- プロンプトの例: [青い車の画像をアップロードして] 「この車をコンバーチブルにしてください」。「次に、色を黄色に変えてください」。

Gemini による画像編集

画像編集を行うには、画像を入力として追加します。次の例は、base64 でエンコードされた画像のアップロードを示しています。複数の画像や大きなペイロードの場合は、画像入力のセクションをご覧ください。

PythonNode.jsREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

import PIL.Image

image = PIL.Image.open('/path/to/image.png')

client = genai.Client()

text_input = ('Hi, This is a picture of me.'
            'Can you add a llama next to me?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['Text', 'Image']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))
    image.show()

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function generateImage() {
    // Load the image from the local file system
    const imagePath = '/path/to/image.png';
    const imageData = fs.readFileSync(imagePath);
    const base64Image = imageData.toString('base64');

    // Prepare the content parts
    const contents = [
        { text: "Hi, This is a picture of me. Can you add a llama next to me?" },
        {
          inlineData: {
            mimeType: 'image/png',
            data: base64Image
          }
        }
      ];

  // Set responseModalities to include "Image" so the model can generate an image
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: {
        responseModalities: ['Text', 'Image']
    },
  });

  try {
    const response = await model.generateContent(contents);
    for (const part of  response.response.candidates[0].content.parts) {
      // Based on the part type, either show the text or save the image
      if (part.text) {
        console.log(part.text);
      } else if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        fs.writeFileSync('gemini-native-image.png', buffer);
        console.log('Image saved as gemini-native-image.png');
      }
    }
  } catch (error) {
    console.error("Error generating content:", error);
  }
}

generateImage();

IMG_PATH=/path/to/your/image1.jpeg

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H 'Content-Type: application/json' \
    -d "{
      \"contents\": [{
        \"parts\":[
            {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"},
            {
              \"inline_data\": {
                \"mime_type\":\"image/jpeg\",
                \"data\": \"$IMG_BASE64\"
              }
            }
        ]
      }],
      \"generationConfig\": {\"responseModalities\": [\"Text\", \"Image\"]}
    }"  \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-edited-image.png

制限事項

最高のパフォーマンスを実現するには、EN、es-MX、ja-JP、zh-CN、hi-IN のいずれかの言語を使用してください。
画像生成では、音声や動画の入力はサポートされていません。
画像生成がトリガーされない場合があります。
- モデルがテキストのみを出力する場合があります。画像出力を明示的に指示してみてください（例: 「画像を生成してください」、「作業時に画像を提供してください」、「画像を更新してください」）。
- モデルの生成が途中で停止することがあります。もう一度お試しいただくか、別のプロンプトをお試しください。
画像のテキストを生成する場合は、まずテキストを生成し、次にテキストを含む画像をリクエストすると、Gemini が最も効果的に機能します。

モデルを選択する

画像の生成に使用するモデルはどれですか。ユースケースによって異なります。

Gemini 2.0 は、コンテキストに関連する画像の生成、テキストと画像の融合、世界知識の組み込み、画像の推論に最適です。これにより、長いテキストシーケンスに埋め込まれた、コンテキストに応じた正確なビジュアルを作成できます。会話全体でコンテキストを維持しながら、自然言語を使用して会話形式で画像を編集することもできます。

画像品質が最優先の場合は、Imagen 3 が適しています。Imagen 3 は、フォトリアル、芸術的なディテール、印象派やアニメなどの特定の芸術スタイルに優れています。Imagen 3 は、商品の背景の更新、画像のアップスケーリング、ブランディングとスタイルのビジュアルへの組み込みなど、特殊な画像編集タスクにも適しています。Imagen 3 を使用すると、ロゴやブランド商品のデザインを作成できます。

Imagen 3 を使用して画像を生成する

Gemini API を使用すると、Google の最高品質のテキスト画像変換モデルである Imagen 3 にアクセスできます。このモデルには、さまざまな新機能と改善機能が搭載されています。Imagen 3 は次のことができます。

以前のモデルよりもディテールが鮮明で、照明が豊富で、演出の妨げになるアーティファクトが少ない画像を生成
自然言語で記述されたプロンプトを理解する
幅広い形式とスタイルの画像を生成
以前のモデルよりも効果的にテキストをレンダリングする

PythonREST

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

client = genai.Client(api_key='GEMINI_API_KEY')

response = client.models.generate_images(
    model='imagen-3.0-generate-002',
    prompt='Fuzzy bunnies in my kitchen',
    config=types.GenerateImagesConfig(
        number_of_images= 4,
    )
)
for generated_image in response.generated_images:
  image = Image.open(BytesIO(generated_image.image.image_bytes))
  image.show()

curl -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict?key=GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "instances": [
          {
            "prompt": "Fuzzy bunnies in my kitchen"
          }
        ],
        "parameters": {
          "sampleCount": 4
        }
      }'

キッチンにいる 2 匹のふわふわうさぎの AI 生成画像 — キッチンにいるふわふわのウサギ 2 匹の AI 生成画像

現時点では、Imagen は英語のみのプロンプトと次のパラメータをサポートしています。

Imagen モデルパラメータ

number_of_images: 生成する画像の数（1 ～ 4）。デフォルトは 4 です。
aspect_ratio: 生成される画像のアスペクト比を変更します。サポートされる値は "1:1"、"3:4"、"4:3"、"9:16"、"16:9" です。デフォルトは "1:1" です。
person_generation: モデルが人物の画像を生成できるようにします。次の値を使用できます。
- "DONT_ALLOW": 人物の画像の生成をブロックします。
- "ALLOW_ADULT": 大人の画像を生成しますが、子供の画像は生成しません。これがデフォルトです。

次のステップ

Imagen のプロンプトの作成方法については、Imagen プロンプトガイドをご覧ください。
Gemini 2.0 モデルの詳細については、Gemini モデルと試験運用版モデルをご覧ください。