這份完整指南涵蓋 Live API 提供的功能和設定。如需常見用途的總覽和程式碼範例,請參閱「開始使用 Live API」頁面。
事前準備
- 熟悉核心概念:如果您還沒看過「開始使用 Live API 」頁面,請先閱讀該頁面。本課程將介紹 Live API 的基本原則、運作方式,以及不同模型和對應音訊生成方法 (原生音訊或半級聯) 之間的差異。
- 在 AI Studio 中試用 Live API:建議您先在 Google AI Studio 中試用 Live API,再開始建構應用程式。如要在 Google AI Studio 中使用 Live API,請選取「串流」。
建立連線
以下範例說明如何使用 API 金鑰建立連線:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started")
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function main() {
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        console.debug(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  // Send content...
  session.close();
}
main();
互動模式
以下各節提供範例和相關背景資訊,說明 Live API 支援的各種輸入和輸出模式。
收發簡訊
以下說明如何收發簡訊:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello, how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )
        async for response in session.receive():
            if response.text is not None:
                print(response.text, end="")
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });
  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
分批更新內容
使用增量更新傳送文字輸入內容、建立工作階段情境資訊,或還原工作階段情境資訊。如為簡短情境,您可以傳送逐輪互動,代表事件的確切順序:
Python
turns = [
    {"role": "user", "parts": [{"text": "What is the capital of France?"}]},
    {"role": "model", "parts": [{"text": "Paris"}]},
]
await session.send_client_content(turns=turns, turn_complete=False)
turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]
await session.send_client_content(turns=turns, turn_complete=True)
JavaScript
let inputTurns = [
  { "role": "user", "parts": [{ "text": "What is the capital of France?" }] },
  { "role": "model", "parts": [{ "text": "Paris" }] },
]
session.sendClientContent({ turns: inputTurns, turnComplete: false })
inputTurns = [{ "role": "user", "parts": [{ "text": "What is the capital of Germany?" }] }]
session.sendClientContent({ turns: inputTurns, turnComplete: true })
如果脈絡較長,建議提供單一訊息摘要,以便在後續互動中釋放脈絡窗口。如要瞭解載入工作階段內容的其他方法,請參閱「繼續工作階段」。
傳送及接收音訊
最常見的音訊範例是音訊轉音訊,請參閱入門指南。
以下是 語音轉文字範例,可讀取 WAV 檔案、以正確格式傳送,並接收文字輸出內容:
Python
# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
        buffer.seek(0)
        audio_bytes = buffer.read()
        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()
        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )
        async for response in session.receive():
            if response.text is not None:
                print(response.text)
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
// Install helpers for converting files: npm install wavefile
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("sample.wav");
  // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
  const wav = new WaveFile();
  wav.fromBuffer(fileBuffer);
  wav.toSampleRate(16000);
  wav.toBitDepth("16");
  const base64Audio = wav.toBase64();
  // If already in correct format, you can use this:
  // const fileBuffer = fs.readFileSync("sample.pcm");
  // const base64Audio = Buffer.from(fileBuffer).toString('base64');
  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }
  );
  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
以下是文字轉語音的範例。
如要接收音訊,請將 AUDIO 設為回應模式。這個範例會將收到的資料儲存為 WAV 檔案:
Python
import asyncio
import wave
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"]}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)
        message = "Hello how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )
        async for response in session.receive():
            if response.data is not None:
                wf.writeframes(response.data)
            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)
        wf.close()
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.AUDIO] };
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });
  const turns = await handleTurn();
  // Combine audio data strings and save as wave file
  const combinedAudio = turns.reduce((acc, turn) => {
    if (turn.data) {
      const buffer = Buffer.from(turn.data, 'base64');
      const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
      return acc.concat(Array.from(intArray));
    }
    return acc;
  }, []);
  const audioBuffer = new Int16Array(combinedAudio);
  const wf = new WaveFile();
  wf.fromScratch(1, 24000, '16', audioBuffer);
  fs.writeFileSync('output.wav', wf.toBuffer());
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
音訊格式
Live API 中的音訊資料一律為原始的小端序 16 位元 PCM。音訊輸出內容一律會使用 24kHz 的取樣率。輸入音訊的原始取樣率為 16 kHz,但 Live API 會視需要重新取樣,因此可以傳送任何取樣率。如要傳達輸入音訊的取樣率,請將每個含有音訊的 Blob 的 MIME 類型設為 audio/pcm;rate=16000 等值。
音訊轉錄
如要啟用模型音訊輸出的轉錄功能,請在設定設定中傳送 output_audio_transcription。系統會根據模型的回覆推斷轉錄語言。
Python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"],
        "output_audio_transcription": {}
}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )
        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.output_transcription:
                print("Transcript:", response.server_content.output_transcription.text)
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
  responseModalities: [Modality.AUDIO],
  outputAudioTranscription: {}
};
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });
  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.serverContent && turn.serverContent.outputTranscription) {
      console.debug('Received output transcription: %s\n', turn.serverContent.outputTranscription.text);
    }
  }
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
您可以在設定檔中傳送 input_audio_transcription,啟用音訊輸入轉錄功能。
Python
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {
    "response_modalities": ["TEXT"],
    "input_audio_transcription": {},
}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_data = Path("16000.pcm").read_bytes()
        await session.send_realtime_input(
            audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
        )
        async for msg in session.receive():
            if msg.server_content.input_transcription:
                print('Transcript:', msg.server_content.input_transcription.text)
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
  responseModalities: [Modality.TEXT],
  inputAudioTranscription: {}
};
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("16000.wav");
  // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
  const wav = new WaveFile();
  wav.fromBuffer(fileBuffer);
  wav.toSampleRate(16000);
  wav.toBitDepth("16");
  const base64Audio = wav.toBase64();
  // If already in correct format, you can use this:
  // const fileBuffer = fs.readFileSync("sample.pcm");
  // const base64Audio = Buffer.from(fileBuffer).toString('base64');
  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }
  );
  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.serverContent && turn.serverContent.outputTranscription) {
      console.log("Transcription")
      console.log(turn.serverContent.outputTranscription.text);
    }
  }
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
    else if (turn.serverContent && turn.serverContent.inputTranscription) {
      console.debug('Received input transcription: %s\n', turn.serverContent.inputTranscription.text);
    }
  }
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
串流播放音訊和影片
變更語音和語言
Live API 模型支援的語音各不相同。 半連鎖支援 Puck、Charon、Kore、Fenrir、Aoede、Leda、Orus 和 Zephyr。 原生音訊支援的語言清單較長 (與 TTS 模型清單相同)。你可以在 AI Studio 中聆聽所有聲音。
如要指定語音,請在 speechConfig 物件中設定語音名稱,做為工作階段設定的一部分:
Python
config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
    },
}
JavaScript
const config = {
  responseModalities: [Modality.AUDIO],
  speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } } }
};
Live API 支援多種語言。
如要變更語言,請在 speechConfig 物件中設定語言代碼,做為工作階段設定的一部分:
Python
config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "language_code": "de-DE"
    }
}
JavaScript
const config = {
  responseModalities: [Modality.AUDIO],
  speechConfig: { languageCode: "de-DE" }
};
內建音訊功能
下列功能僅適用於原生音訊。如要進一步瞭解原生音訊,請參閱「選擇模型並生成音訊」。
如何使用原生音訊輸出
如要使用原生音訊輸出,請設定其中一個原生音訊模型,並將 response_modalities 設為 AUDIO。
如需完整範例,請參閱「傳送及接收音訊」。
Python
model = "gemini-2.5-flash-native-audio-preview-09-2025"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])
async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = { responseModalities: [Modality.AUDIO] };
async function main() {
  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: ...,
  });
  // Send audio input and receive audio
  session.close();
}
main();
情緒感知對話
這項功能可讓 Gemini 根據輸入內容的措辭和語氣調整回覆風格。
如要使用情緒對話,請將 API 版本設為 v1alpha,並在設定訊息中將 enable_affective_dialog 設為 true:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
  responseModalities: [Modality.AUDIO],
  enableAffectiveDialog: true
};
請注意,目前只有原生音訊輸出模型支援情感對話。
主動音訊
啟用這項功能後,如果內容不相關,Gemini 可能會主動決定不回覆。
如要使用這項功能,請將 API 版本設為 v1alpha,並在設定訊息中設定 proactivity 欄位,然後將 proactive_audio 設為 true:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity={'proactive_audio': True}
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
  responseModalities: [Modality.AUDIO],
  proactivity: { proactiveAudio: true }
}
請注意,主動式音訊目前僅支援原生音訊輸出模型。
思考
最新的原生音訊輸出模型gemini-2.5-flash-native-audio-preview-09-2025支援思考能力,且預設啟用動態思考功能。
thinkingBudget 參數會引導模型在生成回覆時,使用適當數量的思考詞元。如要停用思考功能,請將 thinkingBudget 設為 0。如要進一步瞭解模型的 thinkingBudget 設定詳細資料,請參閱思考預算說明文件。
Python
model = "gemini-2.5-flash-native-audio-preview-09-2025"
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
    )
)
async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
  },
};
async function main() {
  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: ...,
  });
  // Send audio input and receive audio
  session.close();
}
main();
此外,您可以在設定中將 includeThoughts 設為 true,啟用想法摘要。詳情請參閱想法摘要:
Python
model = "gemini-2.5-flash-native-audio-preview-09-2025"
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
        include_thoughts=True
    )
)
JavaScript
const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
    includeThoughts: true,
  },
};
語音活動偵測 (VAD)
語音活動偵測 (VAD) 可讓模型辨識使用者何時說話。這對建立自然對話至關重要,因為使用者可以隨時中斷模型。
VAD 偵測到中斷時,系統會取消並捨棄正在進行的生成作業。工作階段記錄只會保留已傳送給用戶端的資訊。接著,伺服器會傳送 BidiGenerateContentServerContent 訊息來回報中斷情形。
接著,Gemini 伺服器會捨棄所有待處理的函式呼叫,並傳送含有已取消呼叫 ID 的 BidiGenerateContentServerContent 訊息。
Python
async for response in session.receive():
    if response.server_content.interrupted is True:
        # The generation was interrupted
        # If realtime playback is implemented in your application,
        # you should stop playing audio and clear queued playback here.
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
  if (turn.serverContent && turn.serverContent.interrupted) {
    // The generation was interrupted
    // If realtime playback is implemented in your application,
    // you should stop playing audio and clear queued playback here.
  }
}
自動 VAD
根據預設,模型會對連續音訊輸入串流自動執行 VAD。您可以使用設定配置的 realtimeInputConfig.automaticActivityDetection 欄位設定 VAD。
如果音訊串流暫停超過一秒 (例如使用者關閉麥克風),就應傳送 audioStreamEnd 事件,清除所有快取音訊。用戶端隨時可以繼續傳送音訊資料。
Python
# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_bytes = Path("sample.pcm").read_bytes()
        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )
        # if stream gets paused, send:
        # await session.send_realtime_input(audio_stream_end=True)
        async for response in session.receive():
            if response.text is not None:
                print(response.text)
if __name__ == "__main__":
    asyncio.run(main())
JavaScript
// example audio file to try:
// URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
// !wget -q $URL -O sample.pcm
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
  const responseQueue = [];
  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }
  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }
  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });
  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("sample.pcm");
  const base64Audio = Buffer.from(fileBuffer).toString('base64');
  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }
  );
  // if stream gets paused, send:
  // session.sendRealtimeInput({ audioStreamEnd: true })
  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }
  session.close();
}
async function main() {
  await live().catch((e) => console.error('got error', e));
}
main();
使用 send_realtime_input 時,API 會根據 VAD 自動回應音訊。send_client_content 會依序將訊息新增至模型內容,而 send_realtime_input 則會犧牲確定性排序,以提升回應速度。
自動 VAD 設定
如要進一步控管 VAD 活動,可以設定下列參數。詳情請參閱 API 參考資料。
Python
from google.genai import types
config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {
        "automatic_activity_detection": {
            "disabled": False, # default
            "start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
            "end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
            "prefix_padding_ms": 20,
            "silence_duration_ms": 100,
        }
    }
}
JavaScript
import { GoogleGenAI, Modality, StartSensitivity, EndSensitivity } from '@google/genai';
const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: false, // default
      startOfSpeechSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
      endOfSpeechSensitivity: EndSensitivity.END_SENSITIVITY_LOW,
      prefixPaddingMs: 20,
      silenceDurationMs: 100,
    }
  }
};
停用自動 VAD
或者,您也可以在設定訊息中將 realtimeInputConfig.automaticActivityDetection.disabled 設為 true,停用自動 VAD。在此設定中,用戶端負責偵測使用者語音,並在適當時間傳送 activityStart 和 activityEnd 訊息。這個設定未傳送 audioStreamEnd。而是以 activityEnd 訊息標示串流中斷。
Python
config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}
async with client.aio.live.connect(model=model, config=config) as session:
    # ...
    await session.send_realtime_input(activity_start=types.ActivityStart())
    await session.send_realtime_input(
        audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
    )
    await session.send_realtime_input(activity_end=types.ActivityEnd())
    # ...
JavaScript
const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: true,
    }
  }
};
session.sendRealtimeInput({ activityStart: {} })
session.sendRealtimeInput(
  {
    audio: {
      data: base64Audio,
      mimeType: "audio/pcm;rate=16000"
    }
  }
);
session.sendRealtimeInput({ activityEnd: {} })
符記數量
您可以在傳回伺服器訊息的 usageMetadata 欄位中,查看已使用的權杖總數。
Python
async for message in session.receive():
    # The server will periodically send messages that include UsageMetadata.
    if message.usage_metadata:
        usage = message.usage_metadata
        print(
            f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
        )
        for detail in usage.response_tokens_details:
            match detail:
                case types.ModalityTokenCount(modality=modality, token_count=count):
                    print(f"{modality}: {count}")
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
  if (turn.usageMetadata) {
    console.debug('Used %s tokens in total. Response token breakdown:\n', turn.usageMetadata.totalTokenCount);
    for (const detail of turn.usageMetadata.responseTokensDetails) {
      console.debug('%s\n', detail);
    }
  }
}
媒體解析度
您可以設定工作階段設定中的 mediaResolution 欄位,指定輸入媒體的媒體解析度:
Python
from google.genai import types
config = {
    "response_modalities": ["AUDIO"],
    "media_resolution": types.MediaResolution.MEDIA_RESOLUTION_LOW,
}
JavaScript
import { GoogleGenAI, Modality, MediaResolution } from '@google/genai';
const config = {
    responseModalities: [Modality.TEXT],
    mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW,
};
限制
規劃專案時,請注意 Live API 的下列限制。
回覆方式
在工作階段設定中,每個工作階段只能設定一種回應模式 (TEXT 或 AUDIO)。如果同時設定這兩項,系統會顯示設定錯誤訊息。也就是說,您可以將模型設定為以文字或音訊回覆,但無法在同一工作階段中同時使用這兩種方式。
用戶端驗證
Live API 預設只提供伺服器對伺服器驗證。如果您使用用戶端對伺服器方法實作 Live API 應用程式,則必須使用臨時權杖來降低安全風險。
工作階段持續時間
僅限音訊的課程最多 15 分鐘,音訊加視訊的課程最多 2 分鐘。不過,您可以設定不同的工作階段管理技術,無限延長工作階段時間。
脈絡窗口
工作階段的脈絡窗口限制如下:
- 原生音訊輸出模型為 128,000 個符記
- 其他 Live API 模型支援 3.2 萬個權杖
支援的語言
Live API 支援下列語言。
| 語言 | BCP-47 代碼 | 語言 | BCP-47 代碼 | 
|---|---|---|---|
| 德文 (德國) | de-DE | 英文 (澳洲)* | en-AU | 
| 英文 (英國)* | en-GB | 英文 (印度) | en-IN | 
| 英文 (美國) | en-US | 西班牙文 (美國) | es-US | 
| 法文 (法國) | fr-FR | 北印度文 (印度) | hi-IN | 
| 葡萄牙文 (巴西) | pt-BR | 阿拉伯文 (一般) | ar-XA | 
| 西班牙文 (西班牙)* | es-ES | 法文 (加拿大)* | fr-CA | 
| 印尼文 (印尼) | id-ID | 義大利文 (義大利) | it-IT | 
| 日文 (日本) | ja-JP | 土耳其文 (土耳其) | tr-TR | 
| 越南文 (越南) | vi-VN | 孟加拉文 (印度) | bn-IN | 
| 古吉拉特文 (印度)* | gu-IN | 卡納達文 (印度)* | kn-IN | 
| 馬拉地文 (印度) | mr-IN | 馬拉雅拉姆文 (印度)* | ml-IN | 
| 泰米爾文 (印度) | ta-IN | 泰盧固文 (印度) | te-IN | 
| 荷蘭文 (荷蘭) | nl-NL | 韓文 (韓國) | ko-KR | 
| 中文 (中國)* | cmn-CN | 波蘭文 (波蘭) | pl-PL | 
| 俄文 (俄羅斯) | ru-RU | 泰文 (泰國) | th-TH | 
標有星號 (*) 的語言不適用於原生音訊。
後續步驟
- 請參閱「工具使用」和「工作階段管理」指南,瞭解如何有效使用 Live API 的重要資訊。
- 在 Google AI Studio 試用 Live API。
- 如要進一步瞭解 Live API 模型,請參閱「模型」頁面上的「Gemini 2.0 Flash Live」和「Gemini 2.5 Flash Native Audio」。
- 如需更多範例,請參閱 Live API 食譜、Live API 工具食譜和 Live API 入門指令碼。