Gemini Deep Research พร้อมให้บริการในเวอร์ชันพรีวิวแล้วตอนนี้ โดยมีฟีเจอร์การวางแผนร่วมกัน การแสดงภาพข้อมูล การรองรับ MCP และอื่นๆ

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

การทำความเข้าใจรูปภาพ

หมายเหตุ: หน้านี้ครอบคลุม Interactions API ใหม่ ซึ่งปัจจุบันอยู่ในเวอร์ชันเบต้า
สำหรับการติดตั้งใช้งานจริงที่เสถียร เราขอแนะนำให้คุณใช้ generateContent API ต่อไป คุณสามารถใช้ปุ่มเปิด/ปิดในหน้านี้เพื่อสลับระหว่างเวอร์ชันต่างๆ

โมเดล Gemini สร้างขึ้นมาใหม่ทั้งหมดให้ทำงานได้หลายรูปแบบ ซึ่งจะปลดล็อกงานประมวลผลรูปภาพและคอมพิวเตอร์วิชันซิสเต็มที่หลากหลาย รวมถึงการใส่คำบรรยายรูปภาพ การจัดหมวดหมู่ และการตอบคำถามเกี่ยวกับภาพโดยไม่ต้องฝึกโมเดล ML เฉพาะ

นอกจากความสามารถทั่วไปในการทำงานได้หลายรูปแบบแล้ว โมเดล Gemini ยังมีความ แม่นยำที่เพิ่มขึ้น สำหรับ Use Case เฉพาะ เช่น การตรวจจับออบเจ็กต์ และ การแบ่งกลุ่ม ผ่านการ ฝึกเพิ่มเติม

การส่งรูปภาพไปยัง Gemini

คุณสามารถระบุรูปภาพเป็นอินพุตไปยัง Gemini ได้หลายวิธีดังนี้

การส่งรูปภาพโดยใช้ URL: เหมาะสำหรับรูปภาพที่เข้าถึงได้แบบสาธารณะ
การส่งข้อมูลรูปภาพแบบอินไลน์: สำหรับข้อมูลรูปภาพที่เข้ารหัส Base64
การอัปโหลดรูปภาพโดยใช้ File API: แนะนำสำหรับ ไฟล์ขนาดใหญ่หรือสำหรับการใช้รูปภาพซ้ำในคำขอหลายรายการ

การส่งรูปภาพโดยใช้ URL

คุณสามารถอัปโหลดรูปภาพโดยใช้ Files API และส่งรูปภาพ ในคำขอได้ดังนี้

Python

from google import genai

client = genai.Client()

uploaded_file = client.files.upload(file="path/to/organ.jpg")

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "uri": uploaded_file.uri,
            "mime_type": uploaded_file.mime_type
        }
    ]
)
print(interaction.steps[-1].content[0].text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const uploadedFile = await client.files.upload({
    file: "path/to/organ.jpg",
    config: { mimeType: "image/jpeg" }
});

const interaction = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            uri: uploadedFile.uri,
            mimeType: uploadedFile.mimeType
        }
    ]
});
console.log(interaction.steps.at(-1).content[0].text);

REST

# First upload the file using the Files API, then use the URI:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-flash-preview",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "uri": "YOUR_FILE_URI",
        "mime_type": "image/jpeg"
      }
    ]
  }'

การส่งข้อมูลรูปภาพแบบอินไลน์

คุณสามารถระบุข้อมูลรูปภาพเป็นสตริงที่เข้ารหัส Base64 ได้ดังนี้

Python

from google import genai

with open('path/to/small-sample.jpg', 'rb') as f:
    image_bytes = f.read()

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "data": base64.b64encode(image_bytes).decode('utf-8'),
            "mime_type": "image/jpeg"
        }
    ]
)
print(interaction.steps[-1].content[0].text)

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

const client = new GoogleGenAI({});
const base64ImageFile = fs.readFileSync("path/to/small-sample.jpg", {
  encoding: "base64",
});

const interaction = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            data: base64ImageFile,
            mime_type: "image/jpeg"
        }
    ]
});
console.log(interaction.steps.at(-1).content[0].text);

REST

IMG_PATH="/path/to/your/image1.jpg"

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-flash-preview",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "data": "'"$(base64 $B64FLAGS $IMG_PATH)"'",
        "mime_type": "image/jpeg"
      }
    ]
  }'

การอัปโหลดรูปภาพโดยใช้ File API

หากต้องการใช้ไฟล์ขนาดใหญ่หรือใช้ไฟล์รูปภาพเดิมซ้ำๆ ให้ใช้ Files API ดูคู่มือ Files API

Python

from google import genai

client = genai.Client()

my_file = client.files.upload(file="path/to/sample.jpg")

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "uri": my_file.uri,
            "mime_type": my_file.mime_type
        }
    ]
)
print(interaction.steps[-1].content[0].text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const myfile = await client.files.upload({
    file: "path/to/sample.jpg",
    config: { mimeType: "image/jpeg" },
});

const interaction = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            uri: myfile.uri,
            mime_type: myfile.mimeType
        }
    ]
});
console.log(interaction.steps.at(-1).content[0].text);

REST

# First upload the file (see Files API guide for details)
# Then use the file URI in the request:

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-flash-preview",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "uri": "YOUR_FILE_URI",
        "mime_type": "image/jpeg"
      }
    ]
  }'

การใช้พรอมต์ที่มีรูปภาพหลายรูป

คุณสามารถระบุรูปภาพหลายรูปในพรอมต์เดียวได้โดยการรวมออบเจ็กต์รูปภาพหลายรายการไว้ในอาร์เรย์ input ดังนี้

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": "What is different between these two images?"},
        {
            "type": "image",
            "uri": "https://example.com/image1.jpg",
            "mime_type": "image/jpeg"
        },
        {
            "type": "image",
            "uri": "https://example.com/image2.jpg",
            "mime_type": "image/jpeg"
        }
    ]
)
print(interaction.steps[-1].content[0].text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: "gemini-3-flash-preview",
    input: [
        {type: "text", text: "What is different between these two images?"},
        {
            type: "image",
            uri: "https://example.com/image1.jpg",
            mime_type: "image/jpeg"
        },
        {
            type: "image",
            uri: "https://example.com/image2.jpg",
            mime_type: "image/jpeg"
        }
    ]
});
console.log(interaction.steps.at(-1).content[0].text);

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-flash-preview",
    "input": [
      {"type": "text", "text": "What is different between these two images?"},
      {
        "type": "image",
        "uri": "https://example.com/image1.jpg",
        "mime_type": "image/jpeg"
      },
      {
        "type": "image",
        "uri": "https://example.com/image2.jpg",
        "mime_type": "image/jpeg"
      }
    ]
  }'

การตรวจจับออบเจ็กต์

โมเดลได้รับการฝึกให้ตรวจจับออบเจ็กต์ในรูปภาพและรับพิกัดกรอบล้อมรอบ พิกัดจะปรับขนาดเป็น [0, 1000] โดยอิงตามขนาดรูปภาพ คุณต้องปรับขนาดพิกัดเหล่านี้ตามขนาดรูปภาพเดิม

Python

from google import genai
from PIL import Image
import json

client = genai.Client()
prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000."

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": prompt},
        {
            "type": "image",
            "uri": "https://example.com/image.png",
            "mime_type": "image/png"
        }
    ],
    response_format={
        "type": "text",
        "mime_type": "application/json"
    }
)

bounding_boxes = json.loads(interaction.steps[-1].content[0].text)
print("Bounding boxes:", bounding_boxes)

ดูตัวอย่างเพิ่มเติมได้ที่ Notebook ต่อไปนี้ใน Gemini Cookbook:

การแบ่งกลุ่ม

โมเดลตั้งแต่ Gemini 2.5 เป็นต้นไปไม่เพียงแต่ตรวจจับรายการต่างๆ เท่านั้น แต่ยังแบ่งกลุ่มรายการเหล่านั้นและระบุมาสก์ขอบเขตด้วย

โมเดลจะคาดการณ์รายการ JSON ซึ่งแต่ละรายการแสดงมาสก์การแบ่งกลุ่ม แต่ละรายการจะมีกรอบล้อมรอบ ("box_2d") ในรูปแบบ [y0, x0, y1, x1] ที่มีพิกัดปกติระหว่าง 0 ถึง 1000 ป้ายกำกับ ("label") ที่ระบุออบเจ็กต์ และสุดท้ายคือมาสก์การแบ่งกลุ่มภายในกรอบล้อมรอบเป็น PNG ที่เข้ารหัส Base64 ซึ่งเป็นแผนที่ความน่าจะเป็นที่มีค่าระหว่าง 0 ถึง 255

Python

from google import genai
from PIL import Image
import json

client = genai.Client()

prompt = """
Give the segmentation masks for the wooden and glass items.
Output a JSON list of segmentation masks where each entry contains the 2D
bounding box in the key "box_2d", the segmentation mask in key "mask", and
the text label in the key "label". Use descriptive labels.
"""

interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input=[
        {"type": "text", "text": prompt},
        {
            "type": "image",
            "uri": "https://example.com/image.png",
            "mime_type": "image/png"
        }
    ],
    config={
        "thinking_level": "minimal"  # Minimize thinking for better detection results
    }
)

items = json.loads(interaction.steps[-1].content[0].text)
print("Segmentation results:", items)

โต๊ะที่มีคัพเค้ก โดยไฮไลต์วัตถุที่ทำจากไม้และแก้ว — ตัวอย่างเอาต์พุตการแบ่งกลุ่มที่มีออบเจ็กต์และมาสก์การแบ่งกลุ่ม

รูปแบบรูปภาพที่รองรับ

Gemini รองรับประเภท MIME ของรูปแบบรูปภาพต่อไปนี้

PNG - image/png
JPEG - image/jpeg
WebP - image/webp
HEIC - image/heic
HEIF - image/heif

ดูข้อมูลเกี่ยวกับวิธีการป้อนไฟล์อื่นๆ ได้ที่ คู่มือวิธีการป้อนไฟล์

ความสามารถ

โมเดล Gemini ทุกเวอร์ชันทำงานได้หลายรูปแบบและสามารถใช้ในงานประมวลผลรูปภาพและคอมพิวเตอร์วิชันซิสเต็มที่หลากหลาย รวมถึงการใส่คำบรรยายรูปภาพ การตอบคำถามเกี่ยวกับภาพ การจัดหมวดหมู่รูปภาพ การตรวจจับและการแบ่งกลุ่มออบเจ็กต์

Gemini สามารถลดความจำเป็นในการใช้โมเดล ML เฉพาะได้ ทั้งนี้ขึ้นอยู่กับข้อกำหนดด้านคุณภาพและประสิทธิภาพ

โมเดลเวอร์ชันล่าสุดได้รับการฝึกมาโดยเฉพาะเพื่อปรับปรุงความแม่นยำของ งานเฉพาะนอกเหนือจากความสามารถทั่วไป เช่น การตรวจจับ ออบเจ็กต์ที่ได้รับการปรับปรุงและการแบ่งกลุ่ม

ข้อจำกัดและข้อมูลทางเทคนิคที่สำคัญ

ขีดจำกัดของไฟล์

โมเดล Gemini รองรับไฟล์รูปภาพสูงสุด 3,600 ไฟล์ต่อคำขอ

การคำนวณโทเค็น

258 โทเค็นหากทั้ง 2 มิติมีขนาด <= 384 พิกเซล รูปภาพขนาดใหญ่จะถูกแบ่งเป็นรูปภาพขนาด 768x768 พิกเซล ซึ่งแต่ละรูปภาพใช้โทเค็น 258 รายการ

สูตรคร่าวๆ สำหรับการคำนวณจำนวนรูปภาพที่แบ่งมีดังนี้

คำนวณขนาดหน่วยครอบตัดซึ่งมีขนาดประมาณ floor(min(width, height) / 1.5)
หารแต่ละมิติด้วยขนาดหน่วยครอบตัดแล้วคูณกันเพื่อหาจำนวนรูปภาพที่แบ่ง

ตัวอย่างเช่น รูปภาพขนาด 960x540 จะมีขนาดหน่วยครอบตัด 360 หารแต่ละมิติด้วย 360 และจำนวนรูปภาพที่แบ่งคือ 3 * 2 = 6

ความละเอียดของสื่อ

Gemini 3 นำเสนอการควบคุมแบบละเอียดเกี่ยวกับการประมวลผลภาพแบบหลายรูปแบบด้วยพารามิเตอร์ media_resolution พารามิเตอร์ media_resolution จะกำหนดจำนวนโทเค็นสูงสุดที่จัดสรรต่อรูปภาพอินพุตหรือเฟรมวิดีโอ ความละเอียดที่สูงขึ้นจะช่วยเพิ่มความสามารถของโมเดลในการอ่านข้อความขนาดเล็กหรือระบุรายละเอียดเล็กๆ แต่จะเพิ่มการใช้โทเค็นและเวลาในการตอบสนอง

เคล็ดลับและแนวทางปฏิบัติแนะนำ

ตรวจสอบว่ารูปภาพหมุนอย่างถูกต้อง
ใช้รูปภาพที่ชัดเจนและไม่เบลอ
เมื่อใช้รูปภาพเดียวที่มีข้อความ ให้วางพรอมต์ข้อความ ก่อน รูปภาพในอาร์เรย์ input

ขั้นตอนถัดไป

คู่มือนี้จะแสดงวิธีอัปโหลดไฟล์รูปภาพและสร้างเอาต์พุตข้อความจากอินพุตรูปภาพ ดูข้อมูลเพิ่มเติมได้จากแหล่งข้อมูลต่อไปนี้

Files API: ดูข้อมูลเพิ่มเติมเกี่ยวกับการอัปโหลดและจัดการไฟล์เพื่อใช้กับ Gemini
คำแนะนำระบบ: คำแนะนำระบบช่วยให้คุณกำหนดลักษณะการทำงานของโมเดลตามความต้องการและ Use Case ที่เฉพาะเจาะจง
กลยุทธ์การเขียนพรอมต์กับไฟล์: Gemini API รองรับการเขียนพรอมต์กับข้อมูลข้อความ รูปภาพ เสียง และวิดีโอ หรือที่เรียกว่าการเขียนพรอมต์แบบหลายรูปแบบ
คำแนะนำด้านความปลอดภัย: บางครั้งโมเดล Generative AI จะสร้างเอาต์พุตที่ไม่คาดคิด เช่น เอาต์พุตที่ไม่ถูกต้อง มีอคติ หรือไม่เหมาะสม การประมวลผลภายหลังและการประเมินโดยเจ้าหน้าที่เป็นสิ่งสำคัญในการจำกัดความเสี่ยงที่จะเกิดอันตรายจากเอาต์พุตดังกล่าว