Get started with Live API

透過 Live API，您可以與 Gemini 進行低延遲的即時語音和視訊互動。這項技術會處理連續的音訊、視訊或文字串流，立即提供擬真語音回應，為使用者打造自然的對話體驗。

Live API 總覽

Live API 提供一系列功能，例如語音活動偵測、工具使用和函式呼叫、工作階段管理 (用於管理長時間對話) 和暫時性權杖 (用於安全用戶端驗證)。

本頁提供範例和基本程式碼範例，協助您快速上手。

在 Google AI Studio 中試用 Live API

範例應用程式

請參閱下列範例應用程式，瞭解如何將 Live API 用於端對端用途：

AI Studio 上的即時音訊入門應用程式，使用 JavaScript 程式庫連線至 Live API，並透過麥克風和喇叭雙向串流音訊。
使用 Pyaudio 連線至 Live API 的 Live API Python 食譜。

與合作夥伴整合

如果偏好簡化開發程序，可以使用 Daily、LiveKit 或 Voximplant。這些第三方合作夥伴平台已透過 WebRTC 通訊協定整合 Gemini Live API，可簡化即時音訊和視訊應用程式的開發作業。

選擇導入方式

整合 Live API 時，您需要選擇下列其中一種實作方式：

伺服器對伺服器：後端會使用 WebSockets 連線至 Live API。一般來說，用戶端會將串流資料 (音訊、影片、文字) 傳送至伺服器，然後伺服器會將資料轉送至 Live API。
用戶端到伺服器：前端程式碼會使用 WebSockets 直接連線至 Live API 來串流資料，略過後端。

開始使用

這個範例會讀取 WAV 檔案、以正確格式傳送檔案，並將收到的資料儲存為 WAV 檔案。

你可以將音訊轉換為 16 位元 PCM、16 kHz 單聲道格式，然後傳送音訊。如要接收音訊，請將 AUDIO 設為回應模式。輸出內容的取樣率為 24 kHz。

Python

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
import wave
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client()

# New native audio model:
model = "gemini-2.5-flash-native-audio-preview-09-2025"

config = {
  "response_modalities": ["AUDIO"],
  "system_instruction": "You are a helpful assistant and answer in a friendly tone.",
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)  # Output is 24kHz

        async for response in session.receive():
            if response.data is not None:
                wf.writeframes(response.data)

            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)

        wf.close()

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';  // npm install wavefile
const { WaveFile } = pkg;

const ai = new GoogleGenAI({});
// WARNING: Do not use API keys in client-side (browser based) applications
// Consider using Ephemeral Tokens instead
// More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens

// New native audio model:
const model = "gemini-2.5-flash-native-audio-preview-09-2025"

const config = {
  responseModalities: [Modality.AUDIO],
  systemInstruction: "You are a helpful assistant and answer in a friendly tone."
};

async function live() {
    const responseQueue = [];

    async function waitMessage() {
        let done = false;
        let message = undefined;
        while (!done) {
            message = responseQueue.shift();
            if (message) {
                done = true;
            } else {
                await new Promise((resolve) => setTimeout(resolve, 100));
            }
        }
        return message;
    }

    async function handleTurn() {
        const turns = [];
        let done = false;
        while (!done) {
            const message = await waitMessage();
            turns.push(message);
            if (message.serverContent && message.serverContent.turnComplete) {
                done = true;
            }
        }
        return turns;
    }

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                responseQueue.push(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    // Send Audio Chunk
    const fileBuffer = fs.readFileSync("sample.wav");

    // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
    const wav = new WaveFile();
    wav.fromBuffer(fileBuffer);
    wav.toSampleRate(16000);
    wav.toBitDepth("16");
    const base64Audio = wav.toBase64();

    // If already in correct format, you can use this:
    // const fileBuffer = fs.readFileSync("sample.pcm");
    // const base64Audio = Buffer.from(fileBuffer).toString('base64');

    session.sendRealtimeInput(
        {
            audio: {
                data: base64Audio,
                mimeType: "audio/pcm;rate=16000"
            }
        }

    );

    const turns = await handleTurn();

    // Combine audio data strings and save as wave file
    const combinedAudio = turns.reduce((acc, turn) => {
        if (turn.data) {
            const buffer = Buffer.from(turn.data, 'base64');
            const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
            return acc.concat(Array.from(intArray));
        }
        return acc;
    }, []);

    const audioBuffer = new Int16Array(combinedAudio);

    const wf = new WaveFile();
    wf.fromScratch(1, 24000, '16', audioBuffer);  // output is 24kHz
    fs.writeFileSync('audio.wav', wf.toBuffer());

    session.close();
}

async function main() {
    await live().catch((e) => console.error('got error', e));
}

main();

後續步驟

如要瞭解主要功能和設定，包括語音活動偵測和原生音訊功能，請參閱完整的 Live API 功能指南。
詳閱工具使用指南，瞭解如何將 Live API 與工具和函式呼叫整合。
如要管理長時間進行的對話，請參閱工作階段管理指南。
請參閱臨時權杖指南，瞭解如何在用戶端對伺服器應用程式中安全地進行驗證。
如要進一步瞭解基礎 WebSockets API，請參閱 WebSockets API 參考資料。