Get started with Live API

Live API cho phép bạn tương tác với Gemini bằng giọng nói và video theo thời gian thực với độ trễ thấp. Nó xử lý luồng âm thanh, video hoặc văn bản liên tục để đưa ra phản hồi bằng giọng nói tức thì, giống như con người, tạo ra trải nghiệm trò chuyện tự nhiên cho người dùng.

Tổng quan về Live API

Live API cung cấp một bộ tính năng toàn diện như Phát hiện hoạt động bằng giọng nói, sử dụng công cụ và gọi hàm, quản lý phiên (để quản lý các cuộc trò chuyện kéo dài) và mã thông báo tạm thời (để xác thực an toàn phía máy khách).

Trang này giúp bạn bắt đầu với các ví dụ và đoạn mã mẫu cơ bản.

Dùng thử Live API trong Google AI Studio

Chọn một phương pháp triển khai

Khi tích hợp với Live API, bạn cần chọn một trong các phương pháp triển khai sau:

Máy chủ với máy chủ: Phần phụ trợ của bạn kết nối với Live API bằng WebSockets. Thông thường, ứng dụng của bạn sẽ gửi dữ liệu phát trực tiếp (âm thanh, video, văn bản) đến máy chủ của bạn, sau đó máy chủ sẽ chuyển tiếp dữ liệu đó đến Live API.
Từ ứng dụng đến máy chủ: Mã giao diện người dùng của bạn kết nối trực tiếp với Live API bằng WebSockets để truyền phát dữ liệu, bỏ qua phần phụ trợ.

Nền tảng tích hợp của đối tác

Để đơn giản hoá quá trình phát triển các ứng dụng âm thanh và video theo thời gian thực, bạn có thể sử dụng một dịch vụ tích hợp bên thứ ba hỗ trợ Gemini Live API qua WebRTC hoặc WebSocket.

Pipecat của Daily

Tạo chatbot AI theo thời gian thực bằng Gemini Live và Pipecat.

LiveKit

Sử dụng Gemini Live API với LiveKit Agents.

Agent Development Kit (ADK)

Triển khai Live API bằng Agent Development Kit (ADK).

Voximplant

Triển khai Live API bằng Voximplant.

Bắt đầu

Ví dụ phía máy chủ này truyền trực tuyến âm thanh từ micrô và phát âm thanh được trả về. Để xem các ví dụ hoàn chỉnh từ đầu đến cuối, bao gồm cả ứng dụng khách, hãy xem Các ứng dụng mẫu.

Định dạng âm thanh đầu vào phải ở định dạng PCM 16 bit, 16 kHz, đơn âm và âm thanh nhận được sử dụng tốc độ lấy mẫu 24 kHz.

Python

Cài đặt các trình trợ giúp để phát trực tuyến âm thanh. Bạn có thể phải đáp ứng các phần phụ thuộc bổ sung ở cấp hệ thống (ví dụ: portaudio). Tham khảo tài liệu PyAudio để biết các bước cài đặt chi tiết.

pip install pyaudio

import asyncio
from google import genai
import pyaudio

client = genai.Client()

# --- pyaudio config ---
FORMAT = pyaudio.paInt16
CHANNELS = 1
SEND_SAMPLE_RATE = 16000
RECEIVE_SAMPLE_RATE = 24000
CHUNK_SIZE = 1024

pya = pyaudio.PyAudio()

# --- Live API config ---
MODEL = "gemini-2.5-flash-native-audio-preview-09-2025"
CONFIG = {
    "response_modalities": ["AUDIO"],
    "system_instruction": "You are a helpful and friendly AI assistant.",
}

audio_queue_output = asyncio.Queue()
audio_queue_mic = asyncio.Queue(maxsize=5)
audio_stream = None

async def listen_audio():
    """Listens for audio and puts it into the mic audio queue."""
    global audio_stream
    mic_info = pya.get_default_input_device_info()
    audio_stream = await asyncio.to_thread(
        pya.open,
        format=FORMAT,
        channels=CHANNELS,
        rate=SEND_SAMPLE_RATE,
        input=True,
        input_device_index=mic_info["index"],
        frames_per_buffer=CHUNK_SIZE,
    )
    kwargs = {"exception_on_overflow": False} if __debug__ else {}
    while True:
        data = await asyncio.to_thread(audio_stream.read, CHUNK_SIZE, **kwargs)
        await audio_queue_mic.put({"data": data, "mime_type": "audio/pcm"})

async def send_realtime(session):
    """Sends audio from the mic audio queue to the GenAI session."""
    while True:
        msg = await audio_queue_mic.get()
        await session.send_realtime_input(audio=msg)

async def receive_audio(session):
    """Receives responses from GenAI and puts audio data into the speaker audio queue."""
    while True:
        turn = session.receive()
        async for response in turn:
            if (response.server_content and response.server_content.model_turn):
                for part in response.server_content.model_turn.parts:
                    if part.inline_data and isinstance(part.inline_data.data, bytes):
                        audio_queue_output.put_nowait(part.inline_data.data)

        # Empty the queue on interruption to stop playback
        while not audio_queue_output.empty():
            audio_queue_output.get_nowait()

async def play_audio():
    """Plays audio from the speaker audio queue."""
    stream = await asyncio.to_thread(
        pya.open,
        format=FORMAT,
        channels=CHANNELS,
        rate=RECEIVE_SAMPLE_RATE,
        output=True,
    )
    while True:
        bytestream = await audio_queue_output.get()
        await asyncio.to_thread(stream.write, bytestream)

async def run():
    """Main function to run the audio loop."""
    try:
        async with client.aio.live.connect(
            model=MODEL, config=CONFIG
        ) as live_session:
            print("Connected to Gemini. Start speaking!")
            async with asyncio.TaskGroup() as tg:
                tg.create_task(send_realtime(live_session))
                tg.create_task(listen_audio())
                tg.create_task(receive_audio(live_session))
                tg.create_task(play_audio())
    except asyncio.CancelledError:
        pass
    finally:
        if audio_stream:
            audio_stream.close()
        pya.terminate()
        print("\nConnection closed.")

if __name__ == "__main__":
    try:
        asyncio.run(run())
    except KeyboardInterrupt:
        print("Interrupted by user.")

JavaScript

Cài đặt các trình trợ giúp để phát trực tuyến âm thanh. Bạn có thể cần thêm các phần phụ thuộc ở cấp hệ thống (sox cho Mac/Windows hoặc ALSA cho Linux). Tham khảo tài liệu về loa và micrô để biết các bước cài đặt chi tiết.

npm install mic speaker

import { GoogleGenAI, Modality } from '@google/genai';
import mic from 'mic';
import Speaker from 'speaker';

const ai = new GoogleGenAI({});
// WARNING: Do not use API keys in client-side (browser based) applications
// Consider using Ephemeral Tokens instead
// More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens

// --- Live API config ---
const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  systemInstruction: "You are a helpful and friendly AI assistant.",
};

async function live() {
  const responseQueue = [];
  const audioQueue = [];
  let speaker;

  async function waitMessage() {
    while (responseQueue.length === 0) {
      await new Promise((resolve) => setImmediate(resolve));
    }
    return responseQueue.shift();
  }

  function createSpeaker() {
    if (speaker) {
      process.stdin.unpipe(speaker);
      speaker.end();
    }
    speaker = new Speaker({
      channels: 1,
      bitDepth: 16,
      sampleRate: 24000,
    });
    speaker.on('error', (err) => console.error('Speaker error:', err));
    process.stdin.pipe(speaker);
  }

  async function messageLoop() {
    // Puts incoming messages in the audio queue.
    while (true) {
      const message = await waitMessage();
      if (message.serverContent && message.serverContent.interrupted) {
        // Empty the queue on interruption to stop playback
        audioQueue.length = 0;
        continue;
      }
      if (message.serverContent && message.serverContent.modelTurn && message.serverContent.modelTurn.parts) {
        for (const part of message.serverContent.modelTurn.parts) {
          if (part.inlineData && part.inlineData.data) {
            audioQueue.push(Buffer.from(part.inlineData.data, 'base64'));
          }
        }
      }
    }
  }

  async function playbackLoop() {
    // Plays audio from the audio queue.
    while (true) {
      if (audioQueue.length === 0) {
        if (speaker) {
          // Destroy speaker if no more audio to avoid warnings from speaker library
          process.stdin.unpipe(speaker);
          speaker.end();
          speaker = null;
        }
        await new Promise((resolve) => setImmediate(resolve));
      } else {
        if (!speaker) createSpeaker();
        const chunk = audioQueue.shift();
        await new Promise((resolve) => {
          speaker.write(chunk, () => resolve());
        });
      }
    }
  }

  // Start loops
  messageLoop();
  playbackLoop();

  // Connect to Gemini Live API
  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: {
      onopen: () => console.log('Connected to Gemini Live API'),
      onmessage: (message) => responseQueue.push(message),
      onerror: (e) => console.error('Error:', e.message),
      onclose: (e) => console.log('Closed:', e.reason),
    },
  });

  // Setup Microphone for input
  const micInstance = mic({
    rate: '16000',
    bitwidth: '16',
    channels: '1',
  });
  const micInputStream = micInstance.getAudioStream();

  micInputStream.on('data', (data) => {
    // API expects base64 encoded PCM data
    session.sendRealtimeInput({
      audio: {
        data: data.toString('base64'),
        mimeType: "audio/pcm;rate=16000"
      }
    });
  });

  micInputStream.on('error', (err) => {
    console.error('Microphone error:', err);
  });

  micInstance.start();
  console.log('Microphone started. Speak now...');
}

live().catch(console.error);

Ứng dụng mẫu

Hãy xem các ứng dụng mẫu sau đây minh hoạ cách sử dụng Live API cho các trường hợp sử dụng toàn diện:

Ứng dụng khởi động âm thanh trực tiếp trên AI Studio, sử dụng các thư viện JavaScript để kết nối với Live API và truyền trực tuyến âm thanh hai chiều qua micrô và loa.
Hãy xem phần Tích hợp với đối tác để biết thêm ví dụ và hướng dẫn bắt đầu.

Bước tiếp theo

Đọc hướng dẫn đầy đủ về Các chức năng của Live API để biết các chức năng và cấu hình chính, bao gồm cả tính năng Phát hiện hoạt động bằng giọng nói và các tính năng âm thanh gốc.
Đọc hướng dẫn về Sử dụng công cụ để tìm hiểu cách tích hợp Live API với các công cụ và tính năng gọi hàm.
Hãy đọc hướng dẫn Quản lý phiên để quản lý các cuộc trò chuyện kéo dài.
Đọc hướng dẫn về Mã thông báo tạm thời để xác thực an toàn trong các ứng dụng từ ứng dụng đến máy chủ.
Để biết thêm thông tin về API WebSockets cơ bản, hãy xem Tài liệu tham khảo về API WebSockets.