API تعاملات اکنون به طور عمومی در دسترس است. توصیه می‌کنیم برای دسترسی به جدیدترین ویژگی‌ها و مدل‌ها از این API استفاده کنید.

این صفحه به‌وسیله ‏Cloud Translation API‏ ترجمه شده است.

استفاده از کامپیوتر

ابزار Computer Use به شما امکان می‌دهد تا عامل‌های کنترل مرورگر، موبایل و دسکتاپ بسازید که با وظایف تعامل داشته و آنها را خودکار می‌کنند. با استفاده از اسکرین‌شات‌ها، مدل می‌تواند صفحه کامپیوتر را "ببیند" و با تولید اقدامات خاص رابط کاربری مانند کلیک ماوس و ورودی‌های صفحه کلید "عمل" کند. مشابه فراخوانی تابع، برای دریافت و اجرای اقدامات Computer Use به پیاده‌سازی محیط اجرای سمت کلاینت نیاز خواهید داشت.

Gemini 3.5 Flash مدل پیشنهادی برای استفاده در کامپیوتر است و چندین قابلیت جدید را معرفی می‌کند:

پشتیبانی از چند محیط: ساخت عامل‌ها برای محیط‌های مرورگر، موبایل و دسکتاپ .
اقدامات ساده‌شده با اهداف: اقدامات شامل یک فیلد intent هستند که استدلال مدل را در پشت هر مرحله توضیح می‌دهد.
سیاست‌های ایمنی قابل تنظیم: تنظیم دقیق رفتار ایمنی با دسته‌ها و لغوهای سیاست‌های داخلی.
تشخیص تزریق سریع: اسکن اسکرین‌شات با قابلیت انتخاب برای شناسایی دستورالعمل‌های خصمانه‌ی پنهان.

با استفاده از کامپیوتر، می‌توانید عامل‌هایی بسازید که:

خودکارسازی ورود اطلاعات تکراری یا پر کردن فرم در وب‌سایت‌ها.
انجام تست خودکار برنامه‌های وب و جریان‌های کاربری
انجام تحقیقات در وب‌سایت‌های مختلف (مثلاً جمع‌آوری اطلاعات محصول، قیمت‌ها و نظرات از سایت‌های تجارت الکترونیک برای اطلاع‌رسانی در مورد خرید)

در اینجا یک مثال ساده از فعال کردن ابزار استفاده از کامپیوتر آورده شده است:

پایتون

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Search for 'Gemini API' on Google.",
    config=types.GenerateContentConfig(
        tools=[types.Tool(
            computer_use=types.ComputerUse(
                environment=types.Environment.ENVIRONMENT_BROWSER,
            )
        )]
    )
)

print(response.text)

جاوا اسکریپت

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

const response = await ai.models.generateContent({
  model: 'gemini-3.5-flash',
  contents: "Search for 'Gemini API' on Google.",
  config: {
    tools: [{
      computerUse: {
        environment: "ENVIRONMENT_BROWSER",
      }
    }]
  }
});

console.log(response.text);

نحوه استفاده از کامپیوتر

برای ساخت یک عامل با مدل استفاده از کامپیوتر، باید یک حلقه پیوسته بین برنامه خود و API ایجاد کنید. در اینجا کاری که کد شما در هر مرحله انجام خواهد داد، آمده است:

ارسال درخواست به مدل
- برنامه شما یک درخواست API حاوی ابزار Computer Use، تنظیمات پیکربندی شما (مانند محیط هدف)، اعلان کاربر و یک اسکرین‌شات از صفحه فعلی ارسال می‌کند.
پاسخ مدل را دریافت کنید
- این مدل صفحه نمایش و اعلان را تجزیه و تحلیل می‌کند و پاسخی را برمی‌گرداند که شامل یک function_call پیشنهادی است که نشان‌دهنده یک عمل رابط کاربری (مانند کلیک، اسکرول یا فشردن کلید) است.
- برای Gemini 3.5 Flash ، پاسخ همچنین شامل یک intent استدلالی است که توضیح می‌دهد چرا مدل آن اقدام را انتخاب کرده است.
- این پاسخ همچنین ممکن است شامل یک safety_decision از یک سیستم ایمنی داخلی باشد که اقدام را به عنوان منظم/مجاز، require_confirmation تأیید کاربر یا مسدود شده طبقه‌بندی می‌کند.
اجرای اکشن دریافتی
- اگر عمل مجاز باشد (یا کاربر آن را تأیید کند)، کد سمت کلاینت شما function_call تجزیه می‌کند، مختصات نرمال‌شده را برای مطابقت با نمای شما مقیاس‌بندی می‌کند و عمل را در محیط هدف شما با استفاده از ابزارهای اتوماسیون (مانند Playwright) اجرا می‌کند. اگر عمل مسدود شده باشد، کلاینت شما باید اجرا را متوقف کند یا وقفه را مدیریت کند.
ثبت وضعیت جدید محیط
- پس از اتمام اجرای اکشن، برنامه شما یک اسکرین‌شات جدید می‌گیرد و آن را در یک function_result به مدل ارسال می‌کند تا مرحله بعدی را درخواست کند.

این فرآیند سپس از مرحله ۲ تکرار می‌شود و به طور مداوم اقدام بعدی را از مدل درخواست می‌کند تا زمانی که وظیفه تکمیل یا خاتمه یابد.

مرور کلی استفاده از کامپیوتر

نحوه پیاده‌سازی استفاده از کامپیوتر

قبل از ساخت با ابزار Computer Use، باید موارد زیر را تنظیم کنید:

محیط اجرای امن: عامل خود را در یک ماشین مجازی یا کانتینرِ سندباکس‌شده اجرا کنید تا آن را از سیستم میزبان خود جدا کرده و تأثیر بالقوه آن را محدود کنید. پیاده‌سازی مرجع شامل یک سندباکس آماده برای استفاده مبتنی بر داکر است که می‌توانید به عنوان نقطه شروع از آن استفاده کنید.
کنترل‌کننده‌ی عملیات سمت کلاینت: منطق سمت کلاینت را برای اجرای مختصات، تایپ متن و گرفتن اسکرین‌شات پیاده‌سازی کنید.

مثال‌های زیر از یک مرورگر وب به عنوان محیط اجرا و از Playwright به عنوان هندلر سمت کلاینت استفاده می‌کنند.

0. تنظیم نمایشنامه نویس

ابتدا بسته‌های مورد نیاز را نصب کنید:

pip install google-genai playwright
playwright install chromium

سپس، یک نمونه مرورگر Playwright را برای اجرا، مقداردهی اولیه کنید:

from playwright.sync_api import sync_playwright

# 1. Configure screen dimensions for the target environment
SCREEN_WIDTH = 1440
SCREEN_HEIGHT = 900

# 2. Start the Playwright browser
# In production, utilize a sandboxed environment.
playwright = sync_playwright().start()
# Set headless=False to see the actions performed on your screen
browser = playwright.chromium.launch(headless=False)

# 3. Create a context and page with the specified dimensions
context = browser.new_context(
    viewport={"width": SCREEN_WIDTH, "height": SCREEN_HEIGHT}
)
page = context.new_page()

# 4. Navigate to an initial page to start the task
page.goto("https://www.google.com")

# The 'page', 'SCREEN_WIDTH', and 'SCREEN_HEIGHT' variables
# will be used in the steps below.

۱. ارسال درخواست به مدل

کتابخانه کلاینت را مقداردهی اولیه کنید و ابزار استفاده از کامپیوتر را پیکربندی کنید. توجه داشته باشید که هنگام صدور درخواست نیازی به مشخص کردن اندازه صفحه نمایش نیست؛ مدل مختصات پیکسلی را که با ارتفاع و عرض صفحه نمایش مقیاس‌بندی شده‌اند، پیش‌بینی می‌کند.

فلش Gemini 3.5 (توصیه می‌شود)

پایتون

از google-genai Python SDK (نسخه 2.7.0 یا بالاتر) برای پیکربندی درخواستی که محیط مرورگر را هدف قرار می‌دهد، استفاده کنید:

from google import genai
from google.genai.types import (
    Content,
    Part,
    GenerateContentConfig,
    Tool,
    ComputerUse,
    Environment,
    ThinkingConfig,
)

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        Content(
            role="user",
            parts=[
                Part(text="Find a flight from SF to Hawaii on Jun 30th, coming back on Jul 6th"),
            ],
        )
    ],
    config=GenerateContentConfig(
        tools=[
            Tool(
                computer_use=ComputerUse(
                    environment=Environment.ENVIRONMENT_BROWSER,
                    enable_prompt_injection_detection=True,
                ),
            ),
        ],
        thinking_config=ThinkingConfig(
            include_thoughts=True
        ),
    )
)

print(response.text)

جاوا اسکریپت

از SDK نود.جی‌اس @google/genai برای پیکربندی درخواستی که محیط مرورگر را هدف قرار می‌دهد، استفاده کنید:

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

const response = await ai.models.generateContent({
  model: 'gemini-3.5-flash',
  contents: [
    {
      role: 'user',
      parts: [{ text: "Find a flight from SF to Hawaii on Jun 30th, coming back on Jul 6th" }]
    }
  ],
  config: {
    tools: [{
      computerUse: {
        environment: "ENVIRONMENT_BROWSER",
        enable_prompt_injection_detection: true
      }
    }],
    thinkingConfig: {
      includeThoughts: true
    }
  }
});

console.log(response.text);

استراحت

برای ارسال درخواست از curl استفاده کنید:

curl -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": {
          "text": "Find me a flight from SF to Hawaii on Jun 30th, coming back on Jul 6th. Start by navigating directly to flights.google.com"
        }
      }
    ],
    "tools": [
      {
        "computer_use": {
          "environment": "ENVIRONMENT_BROWSER",
          "enable_prompt_injection_detection": true
        }
      }
    ]
  }'

جمینی ۲.۵ (لگسی)

پایتون

from google import genai
from google.genai import types
from google.genai.types import Content, Part

client = genai.Client()

# Specify predefined functions to exclude (optional)
excluded_functions = ["drag_and_drop"]

generate_content_config = genai.types.GenerateContentConfig(
    tools=[
        types.Tool(
            computer_use=types.ComputerUse(
                environment=types.Environment.ENVIRONMENT_BROWSER,
                excluded_predefined_functions=excluded_functions
                )
              ),
          ],
  )

contents=[
    Content(
        role="user",
        parts=[
            Part(text="Search for highly rated smart fridges on Google Shopping."),
        ],
    )
]

response = client.models.generate_content(
    model='gemini-2.5-computer-use-preview-10-2025',
    contents=contents,
    config=generate_content_config,
)

print(response)

جاوا اسکریپت

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

// Specify predefined functions to exclude (optional)
const excludedFunctions = ["drag_and_drop"];

const response = await ai.models.generateContent({
  model: 'gemini-2.5-computer-use-preview-10-2025',
  contents: [
    {
      role: 'user',
      parts: [{ text: "Search for highly rated smart fridges on Google Shopping." }]
    }
  ],
  config: {
    tools: [{
      computerUse: {
        environment: "ENVIRONMENT_BROWSER",
        excluded_predefined_functions: excludedFunctions
      }
    }]
  }
});

console.log(response);

۲. پاسخ مدل را دریافت کنید

مدل پاسخ، فراخوانی یک تابع را پیشنهاد می‌دهد. برای Gemini 3.5 Flash ، پاسخ شامل یک هدف استدلالی متناسب با مختصات است. در زیر نمونه‌هایی از هر دو پاسخ نشان داده شده است:

فلش جمینی ۳.۵

{
  "function_call": {
    "name": "click",
    "args": {
      "x": 450,
      "y": 120,
      "intent": "Click the search box to type the destination."
    }
  }
}

جمینی ۲.۵ (لگسی)

{
  "content": {
    "parts": [
      {
        "text": "I will type the search query into the search bar."
      },
      {
        "function_call": {
          "name": "type_text_at",
          "args": {
            "x": 371,
            "y": 470,
            "text": "highly rated smart fridges",
            "press_enter": true
          }
        }
      }
    ]
  }
}

۳. اقدامات دریافتی را اجرا کنید

کد برنامه شما باید پاسخ مدل را تجزیه و تحلیل کند، اقدامات را اجرا کند و نتایج را جمع‌آوری کند.

کد زیر هم دستورات ابزار قدیمی ( click_at ، type_text_at ) و هم دستورات ساده‌شده‌ی Gemini 3.5 Flash ( click ، type ) را مدیریت می‌کند.

پایتون

from typing import Any, List, Tuple
import time

def denormalize_x(x: int, screen_width: int) -> int:
    """Convert normalized x coordinate (0-1000) to actual pixel coordinate."""
    return int(x / 1000 * screen_width)

def denormalize_y(y: int, screen_height: int) -> int:
    """Convert normalized y coordinate (0-1000) to actual pixel coordinate."""
    return int(y / 1000 * screen_height)

def execute_function_calls(interaction, page, screen_width, screen_height):
    results = []
    function_calls = []

    # Parse content parts (Handling legacy and Gemini 3 response structures)
    parts = candidate.content.parts if hasattr(candidate, 'content') else []
    if not parts and hasattr(candidate, 'function_calls'):
        function_calls = candidate.function_calls
    else:
        for part in parts:
            if part.function_call:
                function_calls.append(part.function_call)

    for function_call in function_calls:
        action_result = {}
        fname = function_call.name
        args = function_call.args
        print(f"  -> Executing: {fname} (Intent: {args.get('intent', 'N/A')})")

        try:
            if fname in ("open_web_browser", "open_app"):
                pass # Handled / already open
            elif fname in ("click", "click_at", "double_click", "triple_click", "middle_click", "right_click", "move", "long_press"):
                actual_x = denormalize_x(args["x"], screen_width)
                actual_y = denormalize_y(args["y"], screen_height)

                if fname in ("click", "click_at"):
                    page.mouse.click(actual_x, actual_y)
                elif fname == "double_click":
                    page.mouse.dblclick(actual_x, actual_y)
                elif fname == "right_click":
                    page.mouse.click(actual_x, actual_y, button="right")
                elif fname == "middle_click":
                    page.mouse.click(actual_x, actual_y, button="middle")
                elif fname == "move":
                    page.mouse.move(actual_x, actual_y)
            elif fname in ("type", "type_text_at"):
                actual_x = denormalize_x(args["x"], screen_width) if "x" in args else None
                actual_y = denormalize_y(args["y"], screen_height) if "y" in args else None
                text = args["text"]
                press_enter = args.get("press_enter", False)

                if actual_x is not None and actual_y is not None:
                    page.mouse.click(actual_x, actual_y)
                # Clear field first
                page.keyboard.press("Meta+A")
                page.keyboard.press("Backspace")
                page.keyboard.type(text)
                if press_enter:
                    page.keyboard.press("Enter")
            elif fname == "navigate":
                page.goto(args["url"])
            elif fname == "go_back":
                page.go_back()
            elif fname == "go_forward":
                page.go_forward()
            elif fname == "wait":
                time.sleep(args.get("seconds", 1))
            else:
                print(f"Warning: Custom or unhandled function {fname}")

            page.wait_for_load_state(timeout=5000)
            time.sleep(1)

        except Exception as e:
            print(f"Error executing {fname}: {e}")
            action_result = {"error": str(e)}

        results.append((fname, function_call.id, action_result))

    return results

جاوا اسکریپت

function denormalizeX(x, screenWidth) {
    // Convert normalized x coordinate (0-1000) to actual pixel coordinate.
    return Math.floor((x / 1000) * screenWidth);
}

function denormalizeY(y, screenHeight) {
    // Convert normalized y coordinate (0-1000) to actual pixel coordinate.
    return Math.floor((y / 1000) * screenHeight);
}

async function executeFunctionCalls(candidate, page, screenWidth, screenHeight) {
    const results = [];
    let functionCalls = [];

    // Parse function calls from candidate response
    const parts = candidate.content?.parts || [];
    if (parts.length === 0 && candidate.functionCalls) {
        functionCalls = candidate.functionCalls;
    } else {
        for (const part of parts) {
            if (part.functionCall) {
                functionCalls.push(part.functionCall);
            }
        }
    }

    for (const functionCall of functionCalls) {
        const actionResult = {};
        const fname = functionCall.name;
        const args = functionCall.args;
        console.log(`  -> Executing: ${fname} (Intent: ${args.intent || 'N/A'})`);

        try {
            if (fname === "open_web_browser" || fname === "open_app") {
                // Handled / already open
            } else if (["click", "click_at", "double_click", "triple_click", "middle_click", "right_click", "move", "long_press"].includes(fname)) {
                const actualX = denormalizeX(args.x, screenWidth);
                const actualY = denormalizeY(args.y, screenHeight);

                if (fname === "click" || fname === "click_at") {
                    await page.mouse.click(actualX, actualY);
                } else if (fname === "double_click") {
                    await page.mouse.dblclick(actualX, actualY);
                } else if (fname === "right_click") {
                    await page.mouse.click(actualX, actualY, { button: "right" });
                } else if (fname === "middle_click") {
                    await page.mouse.click(actualX, actualY, { button: "middle" });
                } else if (fname === "move") {
                    await page.mouse.move(actualX, actualY);
                }
            } else if (fname === "type" || fname === "type_text_at") {
                const actualX = args.x !== undefined ? denormalizeX(args.x, screenWidth) : null;
                const actualY = args.y !== undefined ? denormalizeY(args.y, screenHeight) : null;
                const text = args.text;
                const pressEnter = args.press_enter || false;

                if (actualX !== null && actualY !== null) {
                    await page.mouse.click(actualX, actualY);
                }
                // Clear field first
                await page.keyboard.press("Meta+A");
                await page.keyboard.press("Backspace");
                await page.keyboard.type(text);
                if (pressEnter) {
                    await page.keyboard.press("Enter");
                }
            } else if (fname === "navigate") {
                await page.goto(args.url);
            } else if (fname === "go_back") {
                await page.goBack();
            } else if (fname === "go_forward") {
                await page.goForward();
            } else if (fname === "wait") {
                await new Promise(resolve => setTimeout(resolve, (args.seconds || 1) * 1000));
            } else {
                console.log(`Warning: Custom or unhandled function ${fname}`);
            }

            await page.waitForLoadState('load', { timeout: 5000 }).catch(() => {});
            await new Promise(resolve => setTimeout(resolve, 1000));
        } catch (e) {
            console.log(`Error executing ${fname}: ${e}`);
            actionResult.error = e.message;
        }

        results.push([fname, functionCall.id, actionResult]);
    }

    return results;
}

۴. ثبت وضعیت جدید محیط

یک نمایش صفحه نمایش را ضبط کرده و آن را به مدل برگردانید.

پایتون

def get_function_responses(page, results):
    screenshot_bytes = page.screenshot(type="png")
    current_url = page.url
    function_responses = []
    for name, call_id, result in results:
        function_responses.append({
            "type": "function_result",
            "name": name,
            "call_id": call_id,
            "result": [
                {
                    "type": "text",
                    "text": json.dumps({"url": current_url, **result})
                },
                {
                    "type": "image",
                    "data": base64.b64encode(screenshot_bytes).decode("utf-8"),
                    "mime_type": "image/png"
                }
            ]
        })
    return function_responses

جاوا اسکریپت

async function getFunctionResponses(page, results) {
    const screenshotBuffer = await page.screenshot({ type: 'png' });
    const screenshotBase64 = screenshotBuffer.toString('base64');
    const currentUrl = page.url();
    const functionResponses = [];

    for (const [name, callId, result] of results) {
        functionResponses.push({
            type: "function_result",
            name: name,
            call_id: callId,
            result: [
                {
                    type: "text",
                    text: JSON.stringify({ url: currentUrl, ...result })
                },
                {
                    type: "image",
                    data: screenshotBase64,
                    mime_type: "image/png"
                }
            ]
        });
    }
    return functionResponses;
}

پس از اینکه نحوه ثبت و قالب‌بندی وضعیت محیط را تعریف کردید، می‌توانید تمام این مراحل را در یک حلقه اجرای پیوسته ترکیب کنید.

یک حلقه عامل بسازید

برای فعال کردن تعاملات چند مرحله‌ای، چهار مرحله از بخش « نحوه پیاده‌سازی استفاده از کامپیوتر» را در یک حلقه واحد ترکیب کنید. این حلقه به درخواست اقدامات و ارسال نتایج به مدل ادامه می‌دهد تا زمانی که وظیفه کامل شود.

به یاد داشته باشید که تاریخچه مکالمات را به درستی مدیریت کنید، به این صورت که در هر مرحله، هم پاسخ‌های مدل و هم پاسخ‌های تابع خود را به تاریخچه اضافه کنید.

پایتون

import time
from typing import Any, List, Tuple
from playwright.sync_api import sync_playwright
from google import genai
from google.genai import types

client = genai.Client()

SCREEN_WIDTH = 1440
SCREEN_HEIGHT = 900

print("Initializing browser...")
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": SCREEN_WIDTH, "height": SCREEN_HEIGHT})
page = context.new_page()

# Paste helper functions execute_function_calls and get_function_responses here

try:
    page.goto("https://ai.google.dev/gemini-api/docs")

    config = types.GenerateContentConfig(
        tools=[types.Tool(computer_use=types.ComputerUse(
            environment=types.Environment.ENVIRONMENT_BROWSER,
            enable_prompt_injection_detection=True
        ))],
        thinking_config=types.ThinkingConfig(include_thoughts=True),
    )

    initial_screenshot = page.screenshot(type="png")
    USER_PROMPT = "Go to ai.google.dev/gemini-api/docs and search for pricing."
    print(f"Goal: {USER_PROMPT}")

    contents = [
        types.Content(role="user", parts=[
            types.Part(text=USER_PROMPT),
            types.Part.from_bytes(data=initial_screenshot, mime_type='image/png')
        ])
    ]

    # Agent Loop
    turn_limit = 5
    for i in range(turn_limit):
        print(f"\n--- Turn {i+1} ---")
        print("Thinking...")
        response = client.models.generate_content(
            model='gemini-3.5-flash',
            contents=contents,
            config=config,
        )

        candidate = response.candidates[0]
        contents.append(candidate.content)

        has_function_calls = any(part.function_call for part in candidate.content.parts)
        if not has_function_calls:
            text_response = " ".join(
                part.text for part in candidate.content.parts if hasattr(part, 'text')
            )
            print("Agent finished:", text_response)
            break

        print("Executing actions...")
        results = execute_function_calls(candidate, page, SCREEN_WIDTH, SCREEN_HEIGHT)

        print("Capturing state...")
        function_responses = get_function_responses(page, results)

        contents.append(
            types.Content(role="user", parts=[types.Part(function_response=fr) for fr in function_responses])
        )

finally:
    print("Closing browser...")
    browser.close()
    playwright.stop()

جاوا اسکریپت

import { chromium } from 'playwright';
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

// Constants for screen dimensions
const SCREEN_WIDTH = 1440;
const SCREEN_HEIGHT = 900;

console.log("Initializing browser...");
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext({
    viewport: { width: SCREEN_WIDTH, height: SCREEN_HEIGHT }
});
const page = await context.newPage();

// Define helper functions. Copy/paste from steps 3 and 4:
// function denormalizeX(...)
// function denormalizeY(...)
// async function executeFunctionCalls(...)
// async function getFunctionResponses(...)

try {
    await page.goto("https://ai.google.dev/gemini-api/docs");

    const config = {
        tools: [{
            computerUse: {
                environment: "ENVIRONMENT_BROWSER",
                enable_prompt_injection_detection: true
            }
        }],
        thinkingConfig: { includeThoughts: true }
    };

    const initialScreenshotBuffer = await page.screenshot({ type: 'png' });
    const initialScreenshotBase64 = initialScreenshotBuffer.toString('base64');
    const USER_PROMPT = "Go to ai.google.dev/gemini-api/docs and search for pricing.";
    console.log(`Goal: ${USER_PROMPT}`);

    const contents = [
        {
            role: "user",
            parts: [
                { text: USER_PROMPT },
                {
                    inlineData: {
                        data: initialScreenshotBase64,
                        mimeType: "image/png"
                    }
                }
            ]
        }
    ];

    // Agent Loop
    const turnLimit = 5;
    for (let i = 0; i < turnLimit; i++) {
        console.log(`\n--- Turn ${i + 1} ---`);
        console.log("Thinking...");
        const response = await ai.models.generateContent({
            model: 'gemini-3.5-flash',
            contents: contents,
            config: config
        });

        const candidate = response.candidates[0];
        contents.push(candidate.content);

        const hasFunctionCalls = candidate.content.parts.some(part => part.functionCall);
        if (!hasFunctionCalls) {
            const textResponse = candidate.content.parts
                .filter(part => part.text)
                .map(part => part.text)
                .join(" ");
            console.log("Agent finished:", textResponse);
            break;
        }

        console.log("Executing actions...");
        const results = await executeFunctionCalls(candidate, page, SCREEN_WIDTH, SCREEN_HEIGHT);

        console.log("Capturing state...");
        const functionResponses = await getFunctionResponses(page, results);

        contents.push({
            role: "user",
            parts: functionResponses.map(fr => ({
                ...fr
            }))
        });
    }
} finally {
    console.log("Closing browser...");
    await browser.close();
}

محیط‌های پشتیبانی‌شده (Gemini 3.5 Flash)

فلش Gemini 3.5 از سه محیط مشخص شده در تنظیمات computer_use پشتیبانی می‌کند:

محیط مرورگر ( `ENVIRONMENT_BROWSER` )

اقدامات تحت ابزار مرورگر:

نام فرمان	توضیحات	آرگومان‌ها (در فراخوانی تابع)
کلیک	کلیک چپ در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
دوبار_کلیک	روی مختصات دوبار کلیک کنید.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک سه‌تایی	سه بار کلیک در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک_وسط	کلیک‌های میانی در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک راست	کلیک راست روی مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
mouse_down	دکمه ماوس را در مختصات فشار داده و نگه می‌دارد.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
mouse_up	دکمه ماوس را در مختصات رها می‌کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
حرکت	مکان نما را به موقعیت مشخص شده منتقل می کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
نوع	متن را تایپ می‌کند.	`text` : خیابان `press_enter` : bool (اختیاری، پیش‌فرض `false` ) `intent` : خیابان
کشیدن و رها کردن	یک آیتم را از مختصات شروع به مختصات پایان می‌کشد.	`start_y` : عدد صحیح (0-999) `start_x` : عدد صحیح (0-999) `end_y` : عدد صحیح (0-999) `end_x` : عدد صحیح (0-999) `intent` : خیابان
صبر کن	اجرا را برای مدت زمان مشخصی (بر حسب ثانیه) متوقف می‌کند.	`seconds` : عدد صحیح (اختیاری، پیش‌فرض `1` ) `intent` : خیابان
کلید_فشاری	کلید مشخص شده را فشار داده و رها می‌کند.	`key` : خیابان `intent` : خیابان
کلید_پایین	کلید مشخص شده را فشار داده و نگه می‌دارد.	`key` : خیابان `intent` : خیابان
کلید_بالا	کلید مشخص شده را آزاد می‌کند.	`key` : خیابان `intent` : خیابان
کلید میانبر	ترکیب کلید مشخص شده را فشار می‌دهد.	`keys` : `List[str]` `intent` : `str`
گرفتن_اسکرین_شات	یک اسکرین‌شات از صفحه نمایش فعلی برمی‌گرداند.	`intent` : خیابان
اسکرول	با فاصله پیکسلی، به بالا، پایین، چپ یا راست پیمایش می‌کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `direction` : str ( `"up"` ، `"down"` ، `"left"` ، `"right"` ) `magnitude_in_pixels` : عدد صحیح (0-999، اختیاری، پیش‌فرض `300` ) `intent` : خیابان
برو_برگشت	به صفحه وب قبلی در تاریخچه مرورگر برمی‌گردد.	`intent` : خیابان
پیمایش	مستقیماً به یک URL مشخص شده هدایت می‌شود.	`url` : خیابان `intent` : خیابان
برو_به_جلو	به صفحه وب بعدی در تاریخچه مرورگر می‌رود.	`intent` : خیابان

محیط موبایل ( `ENVIRONMENT_MOBILE` )

اقدامات محیطی بهینه شده برای اندروید:

نام فرمان	توضیحات	آرگومان‌ها (در فراخوانی تابع)
باز_برنامه	یک برنامه را با استفاده از نام آن باز می‌کند.	`app_name` : str `intent` : خیابان
کلیک	کلیک چپ در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
لیست_اپلیکیشن‌ها	برنامه‌های موجود روی دستگاه را فهرست می‌کند و نام و نام بسته‌ی آن‌ها را برمی‌گرداند.	`intent` : خیابان
صبر کن	اجرا را برای مدت زمان مشخصی (بر حسب ثانیه) متوقف می‌کند.	`seconds` : عدد صحیح (اختیاری، پیش‌فرض `1` ) `intent` : خیابان
برو_برگشت	به صفحه یا صفحه وب قبلی برمی‌گردد.	`intent` : خیابان
نوع	متن را تایپ می‌کند.	`text` : خیابان `press_enter` : bool (اختیاری، پیش‌فرض `false` ) `intent` : خیابان
کشیدن و رها کردن	یک آیتم را از مختصات شروع به مختصات پایان می‌کشد.	`start_y` : عدد صحیح (0-999) `start_x` : عدد صحیح (0-999) `end_y` : عدد صحیح (0-999) `end_x` : عدد صحیح (0-999) `intent` : خیابان
فشار طولانی	یک فشار طولانی روی یک مختصات روی صفحه انجام می‌دهد.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `seconds` : عدد صحیح (اختیاری، پیش‌فرض `2` ) `intent` : خیابان
کلید_فشاری	کلید مشخص شده را فشار داده و رها می‌کند.	`key` : خیابان `intent` : خیابان
گرفتن_اسکرین_شات	یک اسکرین‌شات از صفحه نمایش فعلی برمی‌گرداند.	`intent` : خیابان

محیط دسکتاپ ( `ENVIRONMENT_DESKTOP` )

دستورات مکان‌نما در سطح سیستم‌عامل در محیط‌های دسکتاپ:

نام فرمان	توضیحات	آرگومان‌ها (در فراخوانی تابع)
کلیک	کلیک چپ در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
دوبار_کلیک	روی مختصات دوبار کلیک کنید.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک سه‌تایی	سه بار کلیک در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک_وسط	کلیک‌های میانی در مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
کلیک راست	کلیک راست روی مختصات.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
mouse_down	دکمه ماوس را در مختصات فشار داده و نگه می‌دارد.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
mouse_up	دکمه ماوس را در مختصات رها می‌کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
حرکت	مکان نما را به موقعیت مشخص شده منتقل می کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `intent` : خیابان
نوع	متن را تایپ می‌کند.	`text` : خیابان `press_enter` : bool (اختیاری، پیش‌فرض `false` ) `intent` : خیابان
کشیدن و رها کردن	یک آیتم را از مختصات شروع به مختصات پایان می‌کشد.	`start_y` : عدد صحیح (0-999) `start_x` : عدد صحیح (0-999) `end_y` : عدد صحیح (0-999) `end_x` : عدد صحیح (0-999) `intent` : خیابان
صبر کن	اجرا را برای مدت زمان مشخصی (بر حسب ثانیه) متوقف می‌کند.	`seconds` : عدد صحیح (اختیاری، پیش‌فرض `1` ) `intent` : خیابان
کلید_فشاری	کلید مشخص شده را فشار داده و رها می‌کند.	`key` : خیابان `intent` : خیابان
کلید_پایین	کلید مشخص شده را فشار داده و نگه می‌دارد.	`key` : خیابان `intent` : خیابان
کلید_بالا	کلید مشخص شده را آزاد می‌کند.	`key` : خیابان `intent` : خیابان
کلید میانبر	ترکیب کلید مشخص شده را فشار می‌دهد.	`keys` : `List[str]` `intent` : `str`
گرفتن_اسکرین_شات	یک اسکرین‌شات از صفحه نمایش فعلی برمی‌گرداند.	`intent` : خیابان
اسکرول	با فاصله پیکسلی، به بالا، پایین، چپ یا راست پیمایش می‌کند.	`y` : عدد صحیح (0-999) `x` : عدد صحیح (0-999) `direction` : str ( `"up"` ، `"down"` ، `"left"` ، `"right"` ) `magnitude_in_pixels` : عدد صحیح (0-999، اختیاری، پیش‌فرض `300` ) `intent` : خیابان

اقدامات رابط کاربری قدیمی پشتیبانی شده (Gemini 2.5)

برای مدل‌های قدیمی ( gemini-2.5-computer-use-preview-10-2025 )، اقدامات زیر پشتیبانی می‌شوند:

نام فرمان	توضیحات	آرگومان‌ها (در فراخوانی تابع)	مثال فراخوانی تابع
مرورگر وب را باز کنید	مرورگر وب را باز می‌کند.	هیچکدام	`{"name": "open_web_browser", "args": {}}`
صبر_5_ثانیه	اجرا را به مدت ۵ ثانیه متوقف می‌کند.	هیچکدام	`{"name": "wait_5_seconds", "args": {}}`
برو_برگشت	به صفحه قبلی در تاریخچه می‌رود.	هیچکدام	`{"name": "go_back", "args": {}}`
برو_به_جلو	به صفحه بعدی در تاریخچه می‌رود.	هیچکدام	`{"name": "go_forward", "args": {}}`
جستجو	به موتور جستجوی پیش‌فرض هدایت می‌شود.	هیچکدام	`{"name": "search", "args": {}}`
پیمایش	مرورگر را مستقیماً به آدرس اینترنتی مشخص شده هدایت می‌کند.	`url` : خیابان	`{"name": "navigate", "args": {"url": "https://www.wikipedia.org"}}`
کلیک_در	کلیک‌ها در یک مختصات خاص.	`y` : عدد صحیح (0-999) ، `x` : عدد صحیح (0-999)	`{"name": "click_at", "args": {"y": 300, "x": 500}}`
hover_at	ماوس را روی یک مختصات خاص نگه می‌دارد.	`y` : عدد صحیح (0-999) ، `x` : عدد صحیح (0-999)	`{"name": "hover_at", "args": {"y": 150, "x": 250}}`
type_text_at	متن را در یک مختصات تایپ می‌کند.	`y` : عدد صحیح (0-999)، `x` : عدد صحیح (0-999)، `text` : رشته، `press_enter` : عدد صحیح (اختیاری، پیش‌فرض صحیح)، `clear_before_typing` : عدد صحیح (اختیاری، پیش‌فرض صحیح)	`{"name": "type_text_at", "args": {"y": 250, "x": 400, "text": "search", "press_enter": false}}`
ترکیب_کلید	کلیدها یا ترکیبی از کلیدها را فشار دهید.	`keys` : خیابان	`{"name": "key_combination", "args": {"keys": "Control+A"}}`
scroll_document	کل صفحه وب را پیمایش می‌کند.	`direction` : خیابان	`{"name": "scroll_document", "args": {"direction": "down"}}`
scroll_at	در مختصات (x,y) اسکرول می‌کند.	`y` : عدد صحیح، `x` : عدد صحیح، `direction` : عدد اعشاری، `magnitude` : عدد صحیح (اختیاری، پیش‌فرض ۸۰۰)	`{"name": "scroll_at", "args": {"y": 500, "x": 500, "direction": "down"}}`
کشیدن و رها کردن	بین دو مختصات جابجا می‌شود.	`y` : عدد صحیح، `x` : عدد صحیح، `destination_y` : عدد صحیح، `destination_x` : عدد صحیح	`{"name": "drag_and_drop", "args": {"y": 100, "destination_y": 500, "destination_x": 500, "x": 100}}`

توابع سفارشی تعریف شده توسط کاربر

شما می‌توانید با اضافه کردن توابع سفارشی تعریف‌شده توسط کاربر، عملکرد مدل را گسترش دهید. برای مثال، در سناریوهای انسان در حلقه (HITL) می‌توانید اقدامات از پیش تعریف‌شده پیش‌فرض را حذف کرده و اقدامات سفارشی را ثبت کنید.

ابزار سفارشی فلش Gemini 3.5

پایتون

اقدامات استاندارد و از پیش تعریف شده مرورگر (مانند click ) را حذف کنید و یک ابزار yield_to_user سفارشی ثبت کنید:

from google import genai
from google.genai import types

client = genai.Client()

yield_to_user_tool = types.FunctionDeclaration(
    name="yield_to_user",
    description="Yields control back to the user for assistance or verification when an automated action is unsafe or ambiguous.",
    parameters=types.Schema(
        type="OBJECT",
        properties={
            "reason": types.Schema(
                type="STRING",
                description="The reason why the agent is yielding control to the human."
            )
        },
        required=["reason"]
    )
)

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Click the submit button. If you need a second factor authentication code, ask me.",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                computer_use=types.ComputerUse(
                    environment="ENVIRONMENT_MOBILE",
                    excluded_predefined_functions=["click"]
                )
            ),
            yield_to_user_tool
        ]
    )
)

ابزارهای سفارشی Gemini 2.5 (Legacy)

پایتون

from typing import Optional, Dict, Any
from google import genai
from google.genai import types

client = genai.Client()

# Define custom tools here
custom_functions = [...] # Describe parameters as FunctionDeclaration object

def make_generate_content_config():
    excluded_functions = ["open_web_browser", "wait_5_seconds", "go_back", "go_forward", "search", "navigate", "hover_at", "scroll_document", "key_combination", "drag_and_drop"]
    generate_content_config = types.GenerateContentConfig(
        tools=[
            types.Tool(
                computer_use=types.ComputerUse(
                    environment=types.Environment.ENVIRONMENT_BROWSER,
                    excluded_predefined_functions=excluded_functions
                )
            ),
            types.Tool(function_declarations=custom_functions)
        ]
    )
    return generate_content_config

مدیریت سطوح تفکر (Gemini 3.5 Flash)

برای عامل‌های کاربری کامپیوتر، می‌توانید سطوح تفکر مختلفی را پیکربندی کنید تا کیفیت عمل و سرعت اجرا را متعادل کنید. سطوح تفکر پایین‌تر معمولاً تعادل خوبی را برای وظایف استاندارد اتوماسیون ایجاد می‌کنند.

ایمنی و امنیت

پیکربندی سیاست‌های ایمنی (Gemini 3.5 Flash)

مدل Gemini 3.5 Flash شامل یک دسته‌بندی خدمات ایمنی داخلی است که به طور خودکار تعیین می‌کند که آیا تأیید کاربر لازم است یا خیر.

دسته بندی سیاست ایمنی	توضیحات
`FINANCIAL_TRANSACTIONS`	تأیید اقدامات مربوط به پرداخت‌ها، تسویه حساب خرده‌فروشی یا کالاهای تحت نظارت را مسدود یا فعال می‌کند.
`SENSITIVE_DATA_MODIFICATION`	از سوابق بهداشتی، مالی یا دولتی در برابر تغییرات غیرمجاز محافظت می‌کند.
`COMMUNICATION_TOOL`	عامل را از ارسال خودکار ایمیل، پیام‌های چت یا پیش‌نویس‌ها محدود می‌کند.
`ACCOUNT_CREATION`	عامل را از ثبت خودکار حساب‌های جدید در وب‌سایت‌ها محدود می‌کند.
`DATA_MODIFICATION`	تغییرات کلی سیستم فایل، اشتراک‌گذاری داده‌ها و حذف فضای ذخیره‌سازی را تنظیم می‌کند.
`USER_CONSENT_MANAGEMENT`	برای بنرهای رضایت کوکی و درخواست‌های حریم خصوصی، نیاز به تصاحب کاربر دارد.
`LEGAL_TERMS_AND_AGREEMENTS`	مانع از پذیرش خودکار شرایط خدمات یا قراردادهای الزام‌آور قانونی توسط مدل می‌شود.

ایمنی نادیده گرفته می‌شود

شما می‌توانید با ارسال overrideها، سیاست‌های select را لغو کنید:

پایتون

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Clean up the local folder by archiving old logs.",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                computer_use=types.ComputerUse(
                    environment=types.Environment.ENVIRONMENT_DESKTOP,
                    disabled_safety_policies=[
                        types.SafetyPolicy.DATA_MODIFICATION
                    ]
                )
            )
        ]
    )
)

جاوا اسکریپت

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

const response = await ai.models.generateContent({
  model: 'gemini-3.5-flash',
  contents: "Clean up the local folder by archiving old logs.",
  config: {
    tools: [{
      computerUse: {
        environment: "ENVIRONMENT_DESKTOP",
        disabledSafetyPolicies: [
          "DATA_MODIFICATION"
        ]
      }
    }]
  }
});

تشخیص سریع تزریق (Gemini 3.5 Flash)

مکانیزم ایمنی انتخابی که پیکسل‌های اسکرین‌شات را برای یافتن دستورالعمل‌های مخفی خصمانه (مثلاً «دستورات قبلی را نادیده بگیر») اسکن می‌کند و در صورت شناسایی، اجرا را مسدود می‌کند.

تصمیم ایمنی را تصدیق کنید

پاسخ می‌تواند شامل پارامتر safety_decision در آرگومان‌های فراخوانی تابع باشد:

{
  "function_call": {
    "name": "click_at",
    "args": {
      "x": 60,
      "y": 100,
      "safety_decision": {
        "explanation": "Must check check-box",
        "decision": "require_confirmation"
      }
    }
  }
}

اگر safety_decision require_confirmation باشد، از کاربر نهایی درخواست تأیید کنید. اگر کاربر تأیید کرد، safety_acknowledgement در FunctionResponse تنظیم کنید.

پایتون

def get_safety_confirmation(safety_decision):
    # Prompt user for confirmation
    print(f"Safety confirmation required: {safety_decision.get('explanation', '')}")
    return "CONTINUE" # Or TERMINATE

# Inside execute_function_calls, check for safety_decision:
if 'safety_decision' in function_call.args:
    decision = get_safety_confirmation(function_call.args['safety_decision'])
    if decision == "TERMINATE":
        break
    # Include safety_acknowledgement inside the action result
    action_result["safety_acknowledgement"] = True

بهترین شیوه‌های ایمنی

استفاده از کامپیوتر خطرات امنیتی و عملیاتی منحصر به فردی را به همراه دارد، زیرا مدلی که از طرف کاربر عمل می‌کند ممکن است با محتوای غیرقابل اعتماد روی صفحه نمایش مواجه شود یا در اجرای اقدامات اشتباه کند. بهترین شیوه‌های زیر را برای محافظت از داده‌ها و سیستم‌های کاربر اجرا کنید:

انسان در حلقه (HITL):

اجرای تأیید کاربر: هنگامی که پاسخ ایمنی نشان‌دهنده‌ی require_confirmation است (یا تصمیم ایمنی قدیمی آن را الزامی می‌داند)، از کاربر درخواست تأیید کنید.

دستورالعمل‌های ایمنی سفارشی ارائه دهید: یک دستورالعمل سیستمی سفارشی برای تعریف و اجرای مرزهای ایمنی خود پیاده‌سازی کنید. به عنوان مثال:

پایتون

from google import genai
from google.genai import types

system_instruction = """
## **RULE 1: Seek User Confirmation (USER_CONFIRMATION)**

This is your first and most important check. If the next required action falls
into any of the following categories, you MUST stop immediately, and seek the
user's explicit permission.

**Procedure for Seeking Confirmation:**
* **For Consequential Actions:** Perform all preparatory steps (e.g., navigating,
  filling out forms, typing a message). You will ask for confirmation **AFTER**
  all necessary information is entered on the screen, but **BEFORE** you perform
  the final, irreversible action (e.g., before clicking "Send", "Submit",
  "Confirm Purchase", "Share").
* **For Prohibited Actions:** If the action is strictly forbidden (e.g., accepting
  legal terms, solving a CAPTCHA), you must first inform the user about the
  required action and ask for their confirmation to proceed.

**USER_CONFIRMATION Categories:**

*   **Consent and Agreements:** You are FORBIDDEN from accepting, selecting, or
    agreeing to any of the following on the user's behalf. You must ask the
    user to confirm before performing these actions.
    *   Terms of Service
    *   Privacy Policies
    *   Cookie consent banners
    *   End User License Agreements (EULAs)
    *   Any other legally significant contracts or agreements.
*   **Robot Detection:** You MUST NEVER attempt to solve or bypass the
    following. You must ask the user to confirm before performing these actions.
    *   CAPTCHAs (of any kind)
    *   Any other anti-robot or human-verification mechanisms, even if you are
        capable.
*   **Financial Transactions:**
    *   Completing any purchase.
    *   Managing or moving money (e.g., transfers, payments).
    *   Purchasing regulated goods or participating in gambling.
*   **Sending Communications:**
    *   Sending emails.
    *   Sending messages on any platform (e.g., social media, chat apps).
    *   Posting content on social media or forums.
*   **Accessing or Modifying Sensitive Information:**
    *   Health, financial, or government records (e.g., medical history, tax
        forms, passport status).
    *   Revealing or modifying sensitive personal identifiers (e.g., SSN, bank
        account number, credit card number).
*   **User Data Management:**
    *   Accessing, downloading, or saving files from the web.
    *   Sharing or sending files/data to any third party.
    *   Transferring user data between systems.
*   **Browser Data Usage:**
    *   Accessing or managing Chrome browsing history, bookmarks, autofill data,
        or saved passwords.
*   **Security and Identity:**
    *   Logging into any user account.
    *   Any action that involves misrepresentation or impersonation (e.g.,
        creating a fan account, posting as someone else).
*   **Insurmountable Obstacles:** If you are technically unable to interact with
    a user interface element or are stuck in a loop you cannot resolve, ask the
    user to take over.
---

## **RULE 2: Default Behavior (ACTUATE)**

If an action does **NOT** fall under the conditions for `USER_CONFIRMATION`,
your default behavior is to **Actuate**.

**Actuation Means:**  You MUST proactively perform all necessary steps to move
the user's request forward. Continue to actuate until you either complete the
non-consequential task or encounter a condition defined in Rule 1.

*   **Example 1:** If asked to send money, you will navigate to the payment
    portal, enter the recipient's details, and enter the amount. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Send" button.
*   **Example 2:** If asked to post a message, you will navigate to the site,
    open the post composition window, and write the full message. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Post" button.

    After the user has confirmed, remember to get the user's latest screen
    before continuing to perform actions.

# Final Response Guidelines:
Write final response to the user in the following cases:
- User confirmation
- When the task is complete or you have enough information to respond to the user
"""

client = genai.Client()
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Prepare a draft but do not send.",
    config=types.GenerateContentConfig(
        system_instruction=system_instruction,
        tools=[types.Tool(computer_use=types.ComputerUse(environment="ENVIRONMENT_BROWSER"))]
    )
)

جاوا اسکریپت

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI();

const systemInstruction = `
## **RULE 1: Seek User Confirmation (USER_CONFIRMATION)**

This is your first and most important check. If the next required action falls
into any of the following categories, you MUST stop immediately, and seek the
user's explicit permission.

**Procedure for Seeking Confirmation:**
* **For Consequential Actions:** Perform all preparatory steps (e.g., navigating,
  filling out forms, typing a message). You will ask for confirmation **AFTER**
  all necessary information is entered on the screen, but **BEFORE** you perform
  the final, irreversible action (e.g., before clicking "Send", "Submit",
  "Confirm Purchase", "Share").
* **For Prohibited Actions:** If the action is strictly forbidden (e.g., accepting
  legal terms, solving a CAPTCHA), you must first inform the user about the
  required action and ask for their confirmation to proceed.

**USER_CONFIRMATION Categories:**

*   **Consent and Agreements:** You are FORBIDDEN from accepting, selecting, or
    agreeing to any of the following on the user's behalf. You must ask the
    user to confirm before performing these actions.
    *   Terms of Service
    *   Privacy Policies
    *   Cookie consent banners
    *   End User License Agreements (EULAs)
    *   Any other legally significant contracts or agreements.
*   **Robot Detection:** You MUST NEVER attempt to solve or bypass the
    following. You must ask the user to confirm before performing these actions.
    *   CAPTCHAs (of any kind)
    *   Any other anti-robot or human-verification mechanisms, even if you are
        capable.
*   **Financial Transactions:**
    *   Compleying any purchase.
    *   Managing or moving money (e.g., transfers, payments).
    *   Purchasing regulated goods or participating in gambling.
*   **Sending Communications:**
    *   Sending emails.
    *   Sending messages on any platform (e.g., social media, chat apps).
    *   Posting content on social media or forums.
*   **Accessing or Modifying Sensitive Information:**
    *   Health, financial, or government records (e.g., medical history, tax
        forms, passport status).
    *   Revealing or modifying sensitive personal identifiers (e.g., SSN, bank
        account number, credit card number).
*   **User Data Management:**
    *   Accessing, downloading, or saving files from the web.
    *   Sharing or sending files/data to any third party.
    *   Transferring user data between systems.
*   **Browser Data Usage:**
    *   Accessing or managing Chrome browsing history, bookmarks, autofill data,
        or saved passwords.
*   **Security and Identity:**
    *   Logging into any user account.
    *   Any action that involves misrepresentation or impersonation (e.g.,
        creating a fan account, posting as someone else).
*   **Insurmountable Obstacles:** If you are technically unable to interact with
    a user interface element or are stuck in a loop you cannot resolve, ask the
    user to take over.
---

## **RULE 2: Default Behavior (ACTUATE)**

If an action does **NOT** fall under the conditions for `USER_CONFIRMATION`,
your default behavior is to **Actuate**.

**Actuation Means:**  You MUST proactively perform all necessary steps to move
the user's request forward. Continue to actuate until you either complete the
non-consequential task or encounter a condition defined in Rule 1.

*   **Example 1:** If asked to send money, you will navigate to the payment
    portal, enter the recipient's details, and enter the amount. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Send" button.
*   **Example 2:** If asked to post a message, you will navigate to the site,
    open the post composition window, and write the full message. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Post" button.

    After the user has confirmed, remember to get the user's latest screen
    before continuing to perform actions.

# Final Response Guidelines:
Write final response to the user in the following cases:
- User confirmation
- When the task is complete or you have enough information to respond to the user
`;

const response = await ai.models.generateContent({
  model: 'gemini-3.5-flash',
  contents: "Prepare a draft but do not send.",
  config: {
    systemInstruction: systemInstruction,
    tools: [{
      computerUse: {
        environment: "ENVIRONMENT_BROWSER"
      }
    }]
  }
});

محیط اجرای امن: عامل خود را در یک محیط امن و محافظت‌شده اجرا کنید تا تأثیر بالقوه آن را محدود کنید. این می‌تواند یک ماشین مجازی (VM) محافظت‌شده، یک کانتینر (مثلاً داکر) یا یک پروفایل مرورگر اختصاصی با مجوزهای محدود باشد. برای راهنمایی راه‌اندازی محافظت‌شده با استفاده از داکر، به پیاده‌سازی مرجع گیت‌هاب مراجعه کنید.
پاکسازی ورودی: تمام متن‌های تولید شده توسط کاربر در اعلان‌ها را پاکسازی کنید تا خطر دستورالعمل‌های ناخواسته یا تزریق اعلان را کاهش دهید. این یک لایه امنیتی مفید است، اما جایگزینی برای یک محیط اجرای امن نیست.
محافظان محتوا: از محافظان و APIهای ایمنی محتوا برای ارزیابی ورودی‌های کاربر، ورودی‌ها و خروجی‌های ابزار و پاسخ‌های عامل برای مناسب بودن، تزریق سریع و تشخیص فرار از زندان استفاده کنید.
لیست‌های مجاز و لیست‌های مسدود: مکانیزم‌های فیلترینگ را برای کنترل اینکه مدل کجا می‌تواند پیمایش کند و چه کاری می‌تواند انجام دهد، پیاده‌سازی کنید. یک لیست مسدود از وب‌سایت‌های ممنوعه نقطه شروع خوبی است، در حالی که یک لیست مجاز محدودتر، حتی امن‌تر نیز هست.
قابلیت مشاهده و ثبت وقایع: گزارش‌های دقیقی را برای اشکال‌زدایی، حسابرسی و پاسخ به حوادث نگهداری کنید. کلاینت شما باید اعلان‌ها، تصاویر، اقدامات پیشنهادی مدل ( function_call )، پاسخ‌های ایمنی و تمام اقداماتی که در نهایت توسط کلاینت اجرا می‌شوند را ثبت کند.
مدیریت محیط: اطمینان حاصل کنید که محیط رابط کاربری گرافیکی (GUI) سازگار است. پنجره‌های بازشو، اعلان‌ها یا تغییرات غیرمنتظره در طرح‌بندی می‌تواند مدل را گیج کند. در صورت امکان، برای هر کار جدید از یک حالت شناخته شده و تمیز شروع کنید.

نسخه‌های مدل

شما می‌توانید از Computer Use با مدل‌های زیر استفاده کنید:

Gemini 3.5 Flash ( gemini-3.5-flash ): مدل پیشنهادی برای استفاده در کامپیوتر، شامل اقدامات ساده با اهداف، پشتیبانی از محیط‌های مرورگر، موبایل و دسکتاپ، سیاست‌های ایمنی قابل تنظیم و تشخیص سریع تزریق.
پیش‌نمایش Gemini 3 Flash ( gemini-3-flash-preview ): مدل پیش‌نمایش که از استفاده در کامپیوتر پشتیبانی می‌کند.
Gemini 2.5 (پیش‌نمایش قدیمی) ( gemini-2.5-computer-use-preview-10-2025 ): مدل پیش‌نمایش قدیمی که برای استفاده از کامپیوتر مبتنی بر مرورگر بهینه شده است.

قدم بعدی چیست؟

استفاده از کامپیوتر را در محیط آزمایشی Browserbase آزمایش کنید.
برای مثال، پیاده‌سازی مرجع را بررسی کنید.
درباره سایر ابزارهای API Gemini اطلاعات کسب کنید:
- فراخوانی تابع
- اتصال به زمین با جستجوی گوگل

استفاده از کامپیوتر

پایتون

جاوا اسکریپت

نحوه استفاده از کامپیوتر

نحوه پیاده‌سازی استفاده از کامپیوتر

0. تنظیم نمایشنامه نویس

۱. ارسال درخواست به مدل

فلش Gemini 3.5 (توصیه می‌شود)

پایتون

جاوا اسکریپت

استراحت

جمینی ۲.۵ (لگسی)

پایتون

جاوا اسکریپت

۲. پاسخ مدل را دریافت کنید

فلش جمینی ۳.۵

جمینی ۲.۵ (لگسی)

۳. اقدامات دریافتی را اجرا کنید

پایتون

جاوا اسکریپت

۴. ثبت وضعیت جدید محیط

پایتون

جاوا اسکریپت

یک حلقه عامل بسازید

پایتون

جاوا اسکریپت

محیط‌های پشتیبانی‌شده (Gemini 3.5 Flash)

محیط مرورگر ( ENVIRONMENT_BROWSER )

محیط موبایل ( ENVIRONMENT_MOBILE )

محیط دسکتاپ ( ENVIRONMENT_DESKTOP )

اقدامات رابط کاربری قدیمی پشتیبانی شده (Gemini 2.5)

توابع سفارشی تعریف شده توسط کاربر

ابزار سفارشی فلش Gemini 3.5

پایتون

ابزارهای سفارشی Gemini 2.5 (Legacy)

پایتون

مدیریت سطوح تفکر (Gemini 3.5 Flash)

ایمنی و امنیت

پیکربندی سیاست‌های ایمنی (Gemini 3.5 Flash)

ایمنی نادیده گرفته می‌شود

پایتون

جاوا اسکریپت

تشخیص سریع تزریق (Gemini 3.5 Flash)

تصمیم ایمنی را تصدیق کنید

پایتون

بهترین شیوه‌های ایمنی

پایتون

جاوا اسکریپت

نسخه‌های مدل

قدم بعدی چیست؟

محیط مرورگر ( `ENVIRONMENT_BROWSER` )

محیط موبایل ( `ENVIRONMENT_MOBILE` )

محیط دسکتاپ ( `ENVIRONMENT_DESKTOP` )