Gemini Deep Research พร้อมให้บริการในเวอร์ชันพรีวิวแล้วตอนนี้ โดยมีฟีเจอร์การวางแผนร่วมกัน การแสดงภาพข้อมูล การรองรับ MCP และอื่นๆ

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

เริ่มต้นใช้งาน Gemini Live API โดยใช้ Google GenAI SDK

Gemini Live API ช่วยให้โต้ตอบกับโมเดล Gemini ได้แบบเรียลไทม์และแบบ 2 ทาง โดยรองรับการป้อนข้อมูลเสียง วิดีโอ และข้อความ รวมถึงเอาต์พุตเสียงดั้งเดิม คู่มือนี้อธิบายวิธีผสานรวมกับ API โดยใช้ Google GenAI SDK ในเซิร์ฟเวอร์

ลองใช้ Live API ใน Google AI Studio โคลนแอปตัวอย่างจาก GitHub ใช้ทักษะของเอเจนต์การเขียนโค้ด

ภาพรวม

Gemini Live API ใช้ WebSockets สำหรับการสื่อสารแบบเรียลไทม์ google-genai SDK มีอินเทอร์เฟซแบบอะซิงโครนัสระดับสูงสำหรับการจัดการการเชื่อมต่อเหล่านี้

แนวคิดหลัก

เซสชัน: การเชื่อมต่อกับโมเดลอย่างต่อเนื่อง
Config: การตั้งค่ารูปแบบ (เสียง/ข้อความ) เสียง และคำสั่งของระบบ
อินพุตแบบเรียลไทม์: ส่งเฟรมเสียงและวิดีโอเป็น Blob

การเชื่อมต่อกับ Live API

เริ่มเซสชัน Live API ด้วยคีย์ API โดยทำดังนี้

Python

import asyncio
from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

model = "gemini-3.1-flash-live-preview"
config = {"response_modalities": ["AUDIO"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started")
        # Send content...

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: "YOUR_API_KEY"});
const model = 'gemini-3.1-flash-live-preview';
const config = { responseModalities: [Modality.AUDIO] };

async function main() {

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        console.debug(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  console.debug("Session started");
  // Send content...

  session.close();
}

main();

กำลังส่งข้อความ

คุณส่งข้อความได้โดยใช้ send_realtime_input (Python) หรือ sendRealtimeInput (JavaScript)

Python

await session.send_realtime_input(text="Hello, how are you?")

JavaScript

session.sendRealtimeInput({
  text: 'Hello, how are you?'
});

การส่งเสียง

ต้องส่งเสียงเป็นข้อมูล PCM ดิบ (เสียง PCM ดิบ 16 บิต, 16 kHz, little-endian)

Python

# Assuming 'chunk' is your raw PCM audio bytes
await session.send_realtime_input(
    audio=types.Blob(
        data=chunk,
        mime_type="audio/pcm;rate=16000"
    )
)

JavaScript

// Assuming 'chunk' is a Buffer of raw PCM audio
session.sendRealtimeInput({
  audio: {
    data: chunk.toString('base64'),
    mimeType: 'audio/pcm;rate=16000'
  }
});

ดูตัวอย่างวิธีรับเสียงจากอุปกรณ์ไคลเอ็นต์ (เช่น เบราว์เซอร์) ได้ที่ตัวอย่างตั้งแต่ต้นจนจบใน GitHub

กำลังส่งวิดีโอ

ระบบจะส่งเฟรมวิดีโอเป็นรูปภาพแต่ละรูป (เช่น JPEG หรือ PNG) ที่อัตราเฟรมที่เฉพาะเจาะจง (สูงสุด 1 เฟรมต่อวินาที)

Python

# Assuming 'frame' is your JPEG-encoded image bytes
await session.send_realtime_input(
    video=types.Blob(
        data=frame,
        mime_type="image/jpeg"
    )
)

JavaScript

// Assuming 'frame' is a Buffer of JPEG-encoded image data
session.sendRealtimeInput({
  video: {
    data: frame.toString('base64'),
    mimeType: 'image/jpeg'
  }
});

ดูตัวอย่างวิธีรับวิดีโอจากอุปกรณ์ไคลเอ็นต์ (เช่น เบราว์เซอร์) ได้ที่ตัวอย่างตั้งแต่ต้นจนจบใน GitHub

การรับเสียง

ระบบจะรับคำตอบเสียงของโมเดลเป็นกลุ่มข้อมูล

Python

async for response in session.receive():
    if response.server_content and response.server_content.model_turn:
        for part in response.server_content.model_turn.parts:
            if part.inline_data:
                audio_data = part.inline_data.data
                # Process or play the audio data

JavaScript

// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
  for (const part of content.modelTurn.parts) {
    if (part.inlineData) {
      const audioData = part.inlineData.data;
      // Process or play audioData (base64 encoded string)
    }
  }
}

ดูแอปตัวอย่างใน GitHub เพื่อดูวิธีรับเสียงในเซิร์ฟเวอร์และเล่นในเบราว์เซอร์

กำลังรับข้อความ

การถอดเสียงทั้งข้อมูลจากผู้ใช้และเอาต์พุตโมเดลจะอยู่ในเนื้อหาของเซิร์ฟเวอร์

Python

async for response in session.receive():
    content = response.server_content
    if content:
        if content.input_transcription:
            print(f"User: {content.input_transcription.text}")
        if content.output_transcription:
            print(f"Gemini: {content.output_transcription.text}")

JavaScript

// Inside the onmessage callback
const content = response.serverContent;
if (content?.inputTranscription) {
  console.log('User:', content.inputTranscription.text);
}
if (content?.outputTranscription) {
  console.log('Gemini:', content.outputTranscription.text);
}

การจัดการการเรียกใช้เครื่องมือ

API รองรับการเรียกเครื่องมือ (การเรียกฟังก์ชัน) เมื่อโมเดลขอการเรียกใช้เครื่องมือ คุณต้องเรียกใช้ฟังก์ชันและส่งการตอบกลับกลับ

Python

async for response in session.receive():
    if response.tool_call:
        function_responses = []
        for fc in response.tool_call.function_calls:
            # 1. Execute the function locally
            result = my_tool_function(**fc.args)

            # 2. Prepare the response
            function_responses.append(types.FunctionResponse(
                name=fc.name,
                id=fc.id,
                response={"result": result}
            ))

        # 3. Send the tool response back to the session
        await session.send_tool_response(function_responses=function_responses)

JavaScript

// Inside the onmessage callback
if (response.toolCall) {
  const functionResponses = [];
  for (const fc of response.toolCall.functionCalls) {
    const result = myToolFunction(fc.args);
    functionResponses.push({
      name: fc.name,
      id: fc.id,
      response: { result }
    });
  }
  session.sendToolResponse({ functionResponses });
}

ขั้นตอนถัดไป

อ่านคู่มือความสามารถของ Live API ฉบับเต็มเพื่อดูความสามารถและการกำหนดค่าที่สำคัญ รวมถึงการตรวจหากิจกรรมเสียงและฟีเจอร์เสียงดั้งเดิม
อ่านคู่มือการใช้เครื่องมือเพื่อดูวิธีผสานรวม Live API กับเครื่องมือและการเรียกฟังก์ชัน
อ่านคู่มือการจัดการเซสชันเพื่อจัดการการสนทนาที่ใช้เวลานาน
อ่านคู่มือโทเค็นชั่วคราวเพื่อดูการตรวจสอบสิทธิ์ที่ปลอดภัยในแอปพลิเคชันไคลเอ็นต์ต่อเซิร์ฟเวอร์
ดูข้อมูลเพิ่มเติมเกี่ยวกับ WebSockets API พื้นฐานได้ที่เอกสารอ้างอิง WebSockets API