API-ja e Ndërveprimeve tani është përgjithësisht e disponueshme. Ne rekomandojmë përdorimin e kësaj API-je për qasje në të gjitha veçoritë dhe modelet më të fundit.

Kjo faqe është përkthyer nga Cloud Translation API.

Filloni me Gemini Live API duke përdorur WebSockets

API-ja Gemini Live lejon bashkëveprim në kohë reale, dypalësh me modelet Gemini, duke mbështetur hyrjet audio, video dhe tekst, si dhe daljet audio native. Ky udhëzues shpjegon se si të integrohet direkt me API-n duke përdorur WebSockets të papërpunuara.

Provoni API-n Live në Google AI Studio. Klononi aplikacionin shembull nga GitHub. Përdorni aftësive të agjentit të kodimit.

Përmbledhje

API-ja Gemini Live përdor WebSockets për komunikim në kohë reale. Ndryshe nga përdorimi i një SDK-je, kjo qasje përfshin menaxhimin direkt të lidhjes WebSocket dhe dërgimin/marrjen e mesazheve në një format specifik JSON të përcaktuar nga API-ja.

Konceptet kryesore:

WebSocket Endpoint : URL-ja specifike me të cilën do të lidhet.
Formati i Mesazhit : I gjithë komunikimi bëhet nëpërmjet mesazheve JSON që përputhen me strukturat BidiGenerateContentClientMessage dhe BidiGenerateContentServerMessage .
Menaxhimi i Sesionit : Ju jeni përgjegjës për mirëmbajtjen e lidhjes WebSocket.

Autentifikimi

Autentifikimi trajtohet duke përfshirë çelësin tuaj API si një parametër kërkese në URL-në e WebSocket.

Formati i pikës fundore është:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=YOUR_API_KEY

Zëvendësoni YOUR_API_KEY me çelësin tuaj aktual të API-t.

Autentifikimi me tokena të përkohshëm

Nëse po përdorni tokena efemerale , duhet të lidheni me pikën fundore v1beta . Tokeni efemeral duhet të kalohet si një parametër query access_token .

Formati i pikës fundore për çelësat epemeralë është:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContentConstrained?access_token={short-lived-token}

Zëvendëso {short-lived-token} me tokenin aktual të përkohshëm.

Lidhu me API-në Live

Për të filluar një seancë të drejtpërdrejtë, vendosni një lidhje WebSocket me pikën fundore të autentifikuar. Mesazhi i parë i dërguar nëpërmjet WebSocket duhet të jetë një BidiGenerateContentSetup që përmban config . Për opsionet e plota të konfigurimit, shihni referencën Live API - WebSockets API .

Python

import asyncio
import websockets
import json

API_KEY = "YOUR_API_KEY"
MODEL_NAME = "gemini-3.1-flash-live-preview"
WS_URL = f"wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key={API_KEY}"

async def connect_and_configure():
    async with websockets.connect(WS_URL) as websocket:
        print("WebSocket Connected")

        # 1. Send the initial configuration
        setup_message = {
            "setup": {
                "model": f"models/{MODEL_NAME}",
                "responseModalities": ["AUDIO"],
                "systemInstruction": {
                    "parts": [{"text": "You are a helpful assistant."}]
                }
            }
        }
        await websocket.send(json.dumps(setup_message))
        print("Configuration sent")

        # Keep the session alive for further interactions
        await asyncio.sleep(3600) # Example: keep open for an hour

async def main():
    await connect_and_configure()

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

const API_KEY = "YOUR_API_KEY";
const MODEL_NAME = "gemini-3.1-flash-live-preview";
const WS_URL = `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${API_KEY}`;

const websocket = new WebSocket(WS_URL);

websocket.onopen = () => {
  console.log('WebSocket Connected');

  // 1. Send the initial configuration
  const setupMessage = {
    setup: {
      model: `models/${MODEL_NAME}`,
      responseModalities: ['AUDIO'],
      systemInstruction: {
        parts: [{ text: 'You are a helpful assistant.' }]
      }
    }
  };
  websocket.send(JSON.stringify(setupMessage));
  console.log('Configuration sent');
};

websocket.onmessage = (event) => {
  const response = JSON.parse(event.data);
  console.log('Received:', response);
  // Handle different types of responses here
};

websocket.onerror = (error) => {
  console.error('WebSocket Error:', error);
};

websocket.onclose = () => {
  console.log('WebSocket Closed');
};

Dërgo tekst

Për të dërguar tekst, ndërtoni një mesazh BidiGenerateContentRealtimeInput me fushën e text .

Python

# Inside the websocket context
async def send_text(websocket, text):
    text_message = {
        "realtimeInput": {
            "text": text
        }
    }
    await websocket.send(json.dumps(text_message))
    print(f"Sent text: {text}")

# Example usage: await send_text(websocket, "Hello, how are you?")

JavaScript

function sendTextMessage(text) {
  if (websocket.readyState === WebSocket.OPEN) {
    const textMessage = {
      realtimeInput: {
        text: text
      }
    };
    websocket.send(JSON.stringify(textMessage));
    console.log('Text message sent:', text);
  } else {
    console.warn('WebSocket not open.');
  }
}

// Example usage:
sendTextMessage("Hello, how are you?");

Dërgo audio

Audioja duhet të dërgohet si të dhëna të papërpunuara PCM (audio PCM 16-bit i papërpunuar, 16kHz, little-endian). Ndërtoni një mesazh BidiGenerateContentRealtimeInput me të dhënat audio. mimeType është thelbësor.

Python

# Inside the websocket context
async def send_audio_chunk(websocket, chunk_bytes):
    import base64
    encoded_data = base64.b64encode(chunk_bytes).decode('utf-8')
    audio_message = {
        "realtimeInput": {
            "audio": {
                "data": encoded_data,
                "mimeType": "audio/pcm;rate=16000"
            }
        }
    }
    await websocket.send(json.dumps(audio_message))
    # print("Sent audio chunk") # Avoid excessive logging

# Assuming 'chunk' is your raw PCM audio bytes
# await send_audio_chunk(websocket, chunk)

JavaScript

// Assuming 'chunk' is a Buffer of raw PCM audio
function sendAudioChunk(chunk) {
  if (websocket.readyState === WebSocket.OPEN) {
    const audioMessage = {
      realtimeInput: {
        audio: {
          data: chunk.toString('base64'),
          mimeType: 'audio/pcm;rate=16000'
        }
      }
    };
    websocket.send(JSON.stringify(audioMessage));
    // console.log('Sent audio chunk');
  }
}
// Example usage: sendAudioChunk(audioBuffer);

Për një shembull se si të merrni audion nga pajisja klient (p.sh. shfletuesi) shihni shembullin nga fillimi në fund në GitHub .

Dërgo video

Kornizat e videos dërgohen si imazhe individuale (p.sh., JPEG ose PNG). Ngjashëm me audion, përdorni realtimeInput me një Blob , duke specifikuar mimeType e saktë.

Python

# Inside the websocket context
async def send_video_frame(websocket, frame_bytes, mime_type="image/jpeg"):
    import base64
    encoded_data = base64.b64encode(frame_bytes).decode('utf-8')
    video_message = {
        "realtimeInput": {
            "video": {
                "data": encoded_data,
                "mimeType": mime_type
            }
        }
    }
    await websocket.send(json.dumps(video_message))
    # print("Sent video frame")

# Assuming 'frame' is your JPEG-encoded image bytes
# await send_video_frame(websocket, frame)

JavaScript

// Assuming 'frame' is a Buffer of JPEG-encoded image data
function sendVideoFrame(frame, mimeType = 'image/jpeg') {
  if (websocket.readyState === WebSocket.OPEN) {
    const videoMessage = {
      realtimeInput: {
        video: {
          data: frame.toString('base64'),
          mimeType: mimeType
        }
      }
    };
    websocket.send(JSON.stringify(videoMessage));
    // console.log('Sent video frame');
  }
}
// Example usage: sendVideoFrame(jpegBuffer);

Për një shembull se si të merrni videon nga pajisja klient (p.sh. shfletuesi) shihni shembullin nga fillimi në fund në GitHub .

Merr përgjigje

WebSocket do të dërgojë mbrapsht mesazhe BidiGenerateContentServerMessage . Duhet t'i analizoni këto mesazhe JSON dhe të trajtoni lloje të ndryshme përmbajtjeje.

Python

# Inside the websocket context, in a receive loop
async def receive_loop(websocket):
    async for message in websocket:
        response = json.loads(message)
        print("Received:", response)

        if "serverContent" in response:
            server_content = response["serverContent"]
            # Receiving Audio
            if "modelTurn" in server_content and "parts" in server_content["modelTurn"]:
                for part in server_content["modelTurn"]["parts"]:
                    if "inlineData" in part:
                        audio_data_b64 = part["inlineData"]["data"]
                        # Process or play the base64 encoded audio data
                        # audio_data = base64.b64decode(audio_data_b64)
                        print(f"Received audio data (base64 len: {len(audio_data_b64)})")

            # Receiving Text Transcriptions
            if "inputTranscription" in server_content:
                print(f"User: {server_content['inputTranscription']['text']}")
            if "outputTranscription" in server_content:
                print(f"Gemini: {server_content['outputTranscription']['text']}")

        # Handling Tool Calls
        if "toolCall" in response:
            await handle_tool_call(websocket, response["toolCall"])

# Example usage: await receive_loop(websocket)

JavaScript

websocket.onmessage = (event) => {
  const response = JSON.parse(event.data);
  console.log('Received:', response);

  if (response.serverContent) {
    const serverContent = response.serverContent;
    // Receiving Audio
    if (serverContent.modelTurn?.parts) {
      for (const part of serverContent.modelTurn.parts) {
        if (part.inlineData) {
          const audioData = part.inlineData.data; // Base64 encoded string
          // Process or play audioData
          console.log(`Received audio data (base64 len: ${audioData.length})`);
        }
      }
    }

    // Receiving Text Transcriptions
    if (serverContent.inputTranscription) {
      console.log('User:', serverContent.inputTranscription.text);
    }
    if (serverContent.outputTranscription) {
      console.log('Gemini:', serverContent.outputTranscription.text);
    }
  }

  // Handling Tool Calls
  if (response.toolCall) {
    handleToolCall(response.toolCall);
  }
};

Për një shembull se si të trajtohet përgjigja, shihni shembullin nga fillimi në fund në GitHub .

Menaxho thirrjet e mjeteve

Kur modeli kërkon një thirrje tool, BidiGenerateContentServerMessage do të përmbajë një fushë toolCall . Ju duhet ta ekzekutoni funksionin lokalisht dhe ta dërgoni rezultatin përsëri te WebSocket duke përdorur një mesazh BidiGenerateContentToolResponse .

Python

# Placeholder for your tool function
def my_tool_function(args):
    print(f"Executing tool with args: {args}")
    # Implement your tool logic here
    return {"status": "success", "data": "some result"}

async def handle_tool_call(websocket, tool_call):
    function_responses = []
    for fc in tool_call["functionCalls"]:
        # 1. Execute the function locally
        try:
            result = my_tool_function(fc.get("args", {}))
            response_data = {"result": result}
        except Exception as e:
            print(f"Error executing tool {fc['name']}: {e}")
            response_data = {"error": str(e)}

        # 2. Prepare the response
        function_responses.append({
            "name": fc["name"],
            "id": fc["id"],
            "response": response_data
        })

    # 3. Send the tool response back to the session
    tool_response_message = {
        "toolResponse": {
            "functionResponses": function_responses
        }
    }
    await websocket.send(json.dumps(tool_response_message))
    print("Sent tool response")

# This function is called within the receive_loop when a toolCall is detected.

JavaScript

// Placeholder for your tool function
function myToolFunction(args) {
  console.log(`Executing tool with args:`, args);
  // Implement your tool logic here
  return { status: 'success', data: 'some result' };
}

function handleToolCall(toolCall) {
  const functionResponses = [];
  for (const fc of toolCall.functionCalls) {
    // 1. Execute the function locally
    let result;
    try {
      result = myToolFunction(fc.args || {});
    } catch (e) {
      console.error(`Error executing tool ${fc.name}:`, e);
      result = { error: e.message };
    }

    // 2. Prepare the response
    functionResponses.push({
      name: fc.name,
      id: fc.id,
      response: { result }
    });
  }

  // 3. Send the tool response back to the session
  if (websocket.readyState === WebSocket.OPEN) {
    const toolResponseMessage = {
      toolResponse: {
        functionResponses: functionResponses
      }
    };
    websocket.send(JSON.stringify(toolResponseMessage));
    console.log('Sent tool response');
  } else {
    console.warn('WebSocket not open to send tool response.');
  }
}
// This function is called within websocket.onmessage when a toolCall is detected.

Çfarë vjen më pas

Lexoni udhëzuesin e plotë të Aftësive Live API për aftësitë dhe konfigurimet kryesore; duke përfshirë Zbulimin e Aktivitetit të Zërit dhe veçoritë audio vendase.
Lexoni udhëzuesin e përdorimit të mjetit për të mësuar se si të integroni Live API me mjetet dhe thirrjen e funksioneve.
Lexoni udhëzuesin e menaxhimit të sesioneve për menaxhimin e bisedave të gjata.
Lexoni udhëzuesin e tokenëve Ephemeral për autentifikim të sigurt në aplikacionet klient-me-server .
Për më shumë informacion rreth API-t themelor të WebSockets, shihni referencën e WebSockets API .