أصبحت Interactions API متاحة الآن للجميع. ننصحك باستخدام واجهة برمجة التطبيقات هذه للوصول إلى جميع أحدث الميزات والنماذج.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

فهم الصور

تم تصميم نماذج Gemini لتكون متعدّدة الوسائط منذ البداية، ما يتيح مجموعة واسعة من مهام معالجة الصور والرؤية الحاسوبية، بما في ذلك على سبيل المثال لا الحصر، وضع تعليقات توضيحية للصور وتصنيفها والإجابة عن الأسئلة المرئية بدون الحاجة إلى تدريب نماذج تعلُّم آلي متخصّصة.

بالإضافة إلى إمكاناتها العامة المتعدّدة الوسائط، توفّر نماذج Gemini دقة محسّنة لحالات استخدام معيّنة، مثل رصد الأجسام وتقسيمها، من خلال تدريب إضافي.

إرسال الصور إلى Gemini

يمكنك تقديم الصور كمدخلات إلى Gemini باستخدام عدة طرق:

إرسال صورة باستخدام عنوان URL: هذه الطريقة مثالية للصور المتاحة للجميع.
إرسال بيانات صورة مضمّنة: هذه الطريقة مناسبة لبيانات الصور المرمّزة باستخدام base64.
تحميل الصور باستخدام File API: ننصح بهذه الطريقة لـ الملفات الأكبر حجمًا أو لإعادة استخدام الصور في طلبات متعدّدة.

إرسال صورة باستخدام عنوان URL

يمكنك تحميل صورة باستخدام Files API وإرسالها في الطلب:

Python

from google import genai

client = genai.Client()

uploaded_file = client.files.upload(file="path/to/organ.jpg")

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "uri": uploaded_file.uri,
            "mime_type": uploaded_file.mime_type
        }
    ]
)
print(interaction.output_text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const uploadedFile = await client.files.upload({
    file: "path/to/organ.jpg",
    config: { mimeType: "image/jpeg" }
});

const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            uri: uploadedFile.uri,
            mime_type: uploadedFile.mimeType
        }
    ]
});
console.log(interaction.output_text);

راحة

# First upload the file using the Files API, then use the URI:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "uri": "YOUR_FILE_URI",
        "mime_type": "image/jpeg"
      }
    ]
  }'

إرسال بيانات صورة مضمّنة

يمكنك تقديم بيانات الصورة كسلاسل مرمّزة باستخدام base64:

Python

import base64
from google import genai

with open('path/to/small-sample.jpg', 'rb') as f:
    image_bytes = f.read()

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "data": base64.b64encode(image_bytes).decode('utf-8'),
            "mime_type": "image/jpeg"
        }
    ]
)
print(interaction.output_text)

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

const client = new GoogleGenAI({});
const base64ImageFile = fs.readFileSync("path/to/small-sample.jpg", {
  encoding: "base64",
});

const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            data: base64ImageFile,
            mime_type: "image/jpeg"
        }
    ]
});
console.log(interaction.output_text);

راحة

IMG_PATH="/path/to/your/image1.jpg"

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "data": "'"$(base64 $B64FLAGS $IMG_PATH)"'",
        "mime_type": "image/jpeg"
      }
    ]
  }'

تحميل الصور باستخدام File API

بالنسبة إلى الملفات الكبيرة أو لاستخدام ملف الصورة نفسه بشكل متكرّر، استخدِم Files API. يُرجى الاطّلاع على دليل Files API.

Python

from google import genai

client = genai.Client()

my_file = client.files.upload(file="path/to/sample.jpg")

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": "Caption this image."},
        {
            "type": "image",
            "uri": my_file.uri,
            "mime_type": my_file.mime_type
        }
    ]
)
print(interaction.output_text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const myfile = await client.files.upload({
    file: "path/to/sample.jpg",
    config: { mimeType: "image/jpeg" },
});

const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: [
        {type: "text", text: "Caption this image."},
        {
            type: "image",
            uri: myfile.uri,
            mime_type: myfile.mimeType
        }
    ]
});
console.log(interaction.output_text);

راحة

# First upload the file (see Files API guide for details)
# Then use the file URI in the request:

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "Caption this image."},
      {
        "type": "image",
        "uri": "YOUR_FILE_URI",
        "mime_type": "image/jpeg"
      }
    ]
  }'

الطلبات التي تتضمّن صورًا متعدّدة

يمكنك تقديم صور متعدّدة في طلب واحد من خلال تضمين عناصر صور متعدّدة في مصفوفة input:

Python

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": "What is different between these two images?"},
        {
            "type": "image",
            "uri": "https://example.com/image1.jpg",
            "mime_type": "image/jpeg"
        },
        {
            "type": "image",
            "uri": "https://example.com/image2.jpg",
            "mime_type": "image/jpeg"
        }
    ]
)
print(interaction.output_text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: [
        {type: "text", text: "What is different between these two images?"},
        {
            type: "image",
            uri: "https://example.com/image1.jpg",
            mime_type: "image/jpeg"
        },
        {
            type: "image",
            uri: "https://example.com/image2.jpg",
            mime_type: "image/jpeg"
        }
    ]
});
console.log(interaction.output_text);

راحة

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "What is different between these two images?"},
      {
        "type": "image",
        "uri": "https://example.com/image1.jpg",
        "mime_type": "image/jpeg"
      },
      {
        "type": "image",
        "uri": "https://example.com/image2.jpg",
        "mime_type": "image/jpeg"
      }
    ]
  }'

رصد الأجسام

تم تدريب النماذج على رصد الأجسام في الصورة والحصول على إحداثيات المربّع المحيط بها. تتراوح الإحداثيات، بالنسبة إلى أبعاد الصورة، بين [0, 1000]. عليك إلغاء تحجيم هذه الإحداثيات استنادًا إلى حجم الصورة الأصلية.

Python

from google import genai
from pydantic import BaseModel, Field
from typing import List
import json

client = genai.Client()
prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000."

class BoundingBox(BaseModel):
    box_2d: List[int] = Field(description="The 2D bounding box of the item as [ymin, xmin, ymax, xmax] normalized to 0-1000.")
    mask: List[List[int]] = Field(description="The segmentation mask of the item as a polygon of [x,y] coordinates, normalized to 0-1000.")
    label: str = Field(description="A descriptive label for the item.")

class BoundingBoxes(BaseModel):
    boxes: List[BoundingBox]

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": prompt},
        {
            "type": "image",
            "uri": "https://example.com/image.png",
            "mime_type": "image/png"
        }
    ],
    response_format={
        "type": "text",
        "mime_type": "application/json",
        "schema": BoundingBoxes.model_json_schema()
    }
)

bounding_boxes = BoundingBoxes.model_validate_json(interaction.output_text)
print(bounding_boxes)

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as z from "zod";

const client = new GoogleGenAI({});
const prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000.";

const boundingBoxesSchema = z.object({
  boxes: z.array(z.object({
    box_2d: z.array(z.number()),
    mask: z.array(z.array(z.number())),
    label: z.string()
  }))
});

const interaction = await client.interactions.create({
  model: "gemini-3.5-flash",
  input: [
    { type: "text", text: prompt },
    {
      type: "image",
      uri: "https://example.com/image.png",
      mime_type: "image/png"
    }
  ],
  response_format: {
    type: 'text',
    mime_type: 'application/json',
    schema: z.toJSONSchema(boundingBoxesSchema)
  },
});

const result = boundingBoxesSchema.parse(JSON.parse(interaction.output_text));
console.log(result);

راحة

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000."},
      {
        "type": "image",
        "uri": "https://example.com/image.png",
        "mime_type": "image/png"
      }
    ],
    "response_format": {
      "type": "text",
      "mime_type": "application/json",
      "schema": {
        "type": "object",
        "properties": {
          "boxes": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "box_2d": { "type": "array", "items": { "type": "integer" } },
                "mask": { "type": "array", "items": { "type": "array", "items": { "type": "integer" } } },
                "label": { "type": "string" }
              },
              "required": ["box_2d", "mask", "label"]
            }
          }
        },
        "required": ["boxes"]
      }
    }
  }'

لمزيد من الأمثلة، يُرجى الانتقال إلى Gemini Cookbook.

التقسيم

لا تكتفي نماذج Gemini برصد العناصر، بل تقسمها أيضًا وتوفّر أقنعة محيطها.

يتوقّع النموذج قائمة JSON، يمثّل كل عنصر فيها قناع تقسيم. يحتوي كل عنصر على مربّع محيط ("box_2d") بالتنسيق [ymin, xmin, ymax, xmax] مع إحداثيات عادية بين 0 و1000، وتصنيف ("label") يحدّد الجسم، وأخيرًا قناع التقسيم داخل المربّع المحيط على شكل مضلّع من إحداثيات [x, y] عادية بين 0 و1000.

Python

from google import genai
from pydantic import BaseModel, Field
from typing import List
import json

client = genai.Client()

prompt = """
Give the segmentation masks for the wooden and glass items.
Output a JSON list of segmentation masks where each entry contains the 2D
bounding box in the key "box_2d", the segmentation mask in key "mask", and
the text label in the key "label". Use descriptive labels.
"""

class BoundingBox(BaseModel):
    box_2d: List[int] = Field(description="The 2D bounding box of the item as [ymin, xmin, ymax, xmax] normalized to 0-1000.")
    mask: List[List[int]] = Field(description="The segmentation mask of the item as a polygon of [x,y] coordinates, normalized to 0-1000.")
    label: str = Field(description="A descriptive label for the item.")

class BoundingBoxes(BaseModel):
    boxes: List[BoundingBox]

interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input=[
        {"type": "text", "text": prompt},
        {
            "type": "image",
            "uri": "https://example.com/image.png",
            "mime_type": "image/png"
        }
    ],
    response_format={
        "type": "text",
        "mime_type": "application/json",
        "schema": BoundingBoxes.model_json_schema()
    },
    generation_config={
        "thinking_level": "minimal"
    }
)

items = BoundingBoxes.model_validate_json(interaction.output_text)
print("Segmentation results:", items)

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as z from "zod";

const client = new GoogleGenAI({});
const prompt = `
Give the segmentation masks for the wooden and glass items.
Output a JSON list of segmentation masks where each entry contains the 2D
bounding box in the key "box_2d", the segmentation mask in key "mask", and
the text label in the key "label". Use descriptive labels.
`;

const boundingBoxesSchema = z.object({
  boxes: z.array(z.object({
    box_2d: z.array(z.number()),
    mask: z.array(z.array(z.number())),
    label: z.string()
  }))
});

const interaction = await client.interactions.create({
  model: "gemini-3.5-flash",
  input: [
    { type: "text", text: prompt },
    {
      type: "image",
      uri: "https://example.com/image.png",
      mime_type: "image/png"
    }
  ],
  response_format: {
    type: 'text',
    mime_type: 'application/json',
    schema: z.toJSONSchema(boundingBoxesSchema)
  },
  generation_config: {
    thinking_level: "minimal"
  }
});

const result = boundingBoxesSchema.parse(JSON.parse(interaction.output_text));
console.log(result);

راحة

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3.5-flash",
    "input": [
      {"type": "text", "text": "Give the segmentation masks for the wooden and glass items.\nOutput a JSON list of segmentation masks where each entry contains the 2D\nbounding box in the key \"box_2d\", the segmentation mask in key \"mask\", and\nthe text label in the key \"label\". Use descriptive labels."},
      {
        "type": "image",
        "uri": "https://example.com/image.png",
        "mime_type": "image/png"
      }
    ],
    "response_format": {
      "type": "text",
      "mime_type": "application/json",
      "schema": {
        "type": "object",
        "properties": {
          "boxes": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "box_2d": { "type": "array", "items": { "type": "integer" } },
                "mask": { "type": "array", "items": { "type": "array", "items": { "type": "integer" } } },
                "label": { "type": "string" }
              },
              "required": ["box_2d", "mask", "label"]
            }
          }
        },
        "required": ["boxes"]
      }
    },
    "generation_config": {
      "thinking_level": "minimal"
    }
  }'

طاولة عليها كعك، مع تمييز الأغراض الخشبية والزجاجية — مثال على ناتج التقسيم مع الأجسام وأقنعة التقسيم

تنسيقات الصور المسموح بها

تتيح Gemini أنواع MIME التالية لتنسيقات الصور:

PNG -‏ image/png
JPEG -‏ image/jpeg
تنسيق WebP -‏ image/webp
HEIC -‏ image/heic
HEIF -‏ image/heif

للتعرّف على طرق إدخال الملفات الأخرى، يُرجى الاطّلاع على دليل طرق إدخال الملفات.

الإمكانات

جميع إصدارات نماذج Gemini متعدّدة الوسائط ويمكن استخدامها في مجموعة واسعة من مهام معالجة الصور والرؤية الحاسوبية، بما في ذلك على سبيل المثال لا الحصر، وضع تعليقات توضيحية للصور والإجابة عن الأسئلة المرئية وتصنيف الصور ورصد الأجسام وتقسيمها.

يمكن أن يقلّل Gemini من الحاجة إلى استخدام نماذج تعلُّم آلي متخصّصة استنادًا إلى متطلبات الجودة والأداء.

تم تدريب أحدث إصدارات النماذج خصيصًا لتحسين دقة المهام المتخصّصة بالإضافة إلى الإمكانات العامة، مثل رصد الأجسام المحسّن وتقسيمها.

المحدودية والمعلومات الفنية الرئيسية

حد الملف

تتيح نماذج Gemini ما يصل إلى 3,600 ملف صورة لكل طلب.

احتساب الرموز المميّزة

258 رمزًا مميّزًا إذا كان كلا البُعدَين ≤ 384 بكسل يتم تقسيم الصور الأكبر حجمًا إلى مربّعات بحجم 768 × 768 بكسل، ويكلّف كل مربّع 258 رمزًا مميّزًا.

في ما يلي صيغة تقريبية لحساب عدد المربّعات:

احسب حجم وحدة الاقتصاص، وهو تقريبًا: floor(min(width, height) / 1.5).
اقسِم كل بُعد على حجم وحدة الاقتصاص واضرب النتيجة للحصول على عدد المربّعات.

على سبيل المثال، بالنسبة إلى صورة بأبعاد 960 × 540، سيكون حجم وحدة الاقتصاص 360. اقسِم كل بُعد على 360، ويكون عدد المربّعات 3 × 2 = 6.

دقة الوسائط

يقدّم Gemini 3 تحكّمًا دقيقًا في معالجة الرؤية المتعدّدة الوسائط باستخدام المَعلمة media_resolution. تحدّد المَعلمة media_resolution الحد الأقصى لعدد الرموز المميّزة المخصّصة لكل صورة إدخال أو إطار فيديو. تؤدي الدقة الأعلى إلى تحسين قدرة النموذج على قراءة النص الدقيق أو تحديد التفاصيل الصغيرة، ولكنها تزيد من استخدام الرموز المميّزة والمدة المستغرَقة.

النصائح وأفضل الممارسات

تأكَّد من تدوير الصور بشكل صحيح.
استخدِم صورًا واضحة وغير ضبابية.
عند استخدام صورة واحدة تتضمّن نصًا، ضَع الطلب النصي قبل الصورة في مصفوفة input.

الخطوات التالية

يوضّح لك هذا الدليل كيفية تحميل ملفات الصور وإنشاء نواتج نصية من مدخلات الصور. لمزيد من المعلومات، يُرجى الاطّلاع على المَراجع التالية:

Files API: ‫مزيد من المعلومات حول تحميل الملفات وإدارتها لاستخدامها مع Gemini
تعليمات النظام: تتيح لك تعليمات النظام توجيه سلوك النموذج استنادًا إلى احتياجاتك وحالات استخدامك المحدّدة.
استراتيجيات الطلبات التي تتضمّن ملفات: تتيح Gemini API تقديم طلبات تتضمّن بيانات نصية وصورًا ومقاطع صوتية وفيديوهات، ويُعرف ذلك أيضًا باسم الطلبات المتعدّدة الوسائط.
إرشادات الأمان: في بعض الأحيان، تُنتج نماذج الذكاء الاصطناعي التوليدي نواتج غير متوقّعة، مثل النواتج غير الدقيقة أو المتحيّزة أو المسيئة. تُعدّ المعالجة اللاحقة والتقييم البشري ضروريَين للحدّ من خطر الضرر الناتج عن هذه النواتج.