Trang này được dịch bởi Cloud Translation API.

Live API

Live API cho phép tương tác hai chiều bằng giọng nói và video với Gemini ở độ trễ thấp, cho phép bạn trò chuyện trực tiếp với Gemini trong khi cũng truyền trực tuyến đầu vào video hoặc chia sẻ màn hình. Khi sử dụng Live API, bạn có thể mang đến cho người dùng cuối trải nghiệm trò chuyện tự nhiên, giống như giọng nói của con người.

Bạn có thể dùng thử Live API trong Google AI Studio. Để sử dụng Live API trong Google AI Studio, hãy chọn Luồng.

Cách hoạt động của Live API

Phát trực tiếp

Live API sử dụng mô hình truyền trực tuyến qua kết nối WebSocket. Khi bạn tương tác với API, một kết nối ổn định sẽ được tạo. Dữ liệu đầu vào (âm thanh, video hoặc văn bản) được truyền liên tục đến mô hình và phản hồi của mô hình (văn bản hoặc âm thanh) được truyền lại theo thời gian thực qua cùng một kết nối.

Luồng truyền hai chiều này đảm bảo độ trễ thấp và hỗ trợ các tính năng như phát hiện hoạt động bằng giọng nói, sử dụng công cụ và tạo lời nói.

Tổng quan về Live API

Để biết thêm thông tin về API WebSockets cơ bản, hãy xem tài liệu tham khảo API WebSockets.

Tạo đầu ra

API Trực tiếp xử lý dữ liệu đầu vào đa phương thức (văn bản, âm thanh, video) để tạo văn bản hoặc âm thanh theo thời gian thực. Công cụ này có một cơ chế tích hợp để tạo âm thanh và tuỳ thuộc vào phiên bản mô hình bạn sử dụng, công cụ này sẽ sử dụng một trong hai phương thức tạo âm thanh:

Một nửa thác nước: Mô hình nhận đầu vào âm thanh gốc và sử dụng một thác nước mô hình chuyên biệt gồm các mô hình riêng biệt để xử lý đầu vào và tạo đầu ra âm thanh.
Gốc: Gemini 2.5 ra mắt tính năng tạo âm thanh gốc. Tính năng này trực tiếp tạo ra đầu ra âm thanh, mang đến âm thanh tự nhiên hơn, giọng nói biểu cảm hơn, nhận biết được nhiều ngữ cảnh hơn (ví dụ: giọng điệu) và phản hồi chủ động hơn.

Xây dựng bằng Live API

Trước khi bắt đầu xây dựng bằng Live API, hãy chọn phương pháp tạo âm thanh phù hợp nhất với nhu cầu của bạn.

Thiết lập kết nối

Ví dụ sau đây cho thấy cách tạo kết nối bằng khoá API:

PythonJavaScript

import asyncio
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")

model = "gemini-2.0-flash-live-001"
config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started")

if __name__ == "__main__":
    asyncio.run(main())

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
const model = 'gemini-2.0-flash-live-001';
const config = { responseModalities: [Modality.TEXT] };

async function main() {

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                console.debug(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    // Send content...

    session.close();
}

main();

Gửi và nhận tin nhắn văn bản

Sau đây là cách gửi và nhận tin nhắn:

PythonJavaScript

import asyncio
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello, how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text, end="")

if __name__ == "__main__":
    asyncio.run(main())

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
const model = 'gemini-2.0-flash-live-001';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
    const responseQueue = [];

    async function waitMessage() {
        let done = false;
        let message = undefined;
        while (!done) {
            message = responseQueue.shift();
            if (message) {
                done = true;
            } else {
                await new Promise((resolve) => setTimeout(resolve, 100));
            }
        }
        return message;
    }

    async function handleTurn() {
        const turns = [];
        let done = false;
        while (!done) {
            const message = await waitMessage();
            turns.push(message);
            if (message.serverContent && message.serverContent.turnComplete) {
                done = true;
            }
        }
        return turns;
    }

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                responseQueue.push(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    const simple = 'Hello how are you?';
    session.sendClientContent({ turns: simple });

    const turns = await handleTurn();
    for (const turn of turns) {
        if (turn.text) {
            console.debug('Received text: %s\n', turn.text);
        }
        else if (turn.data) {
            console.debug('Received inline data: %s\n', turn.data);
        }
    }

    session.close();
}

async function main() {
    await live().catch((e) => console.error('got error', e));
}

main();

Gửi và nhận âm thanh

Bạn có thể gửi âm thanh bằng cách chuyển đổi âm thanh đó sang định dạng PCM 16 bit, 16 kHz, đơn âm. Ví dụ này đọc một tệp WAV và gửi tệp đó theo đúng định dạng:

PythonJavaScript

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
// Install helpers for converting files: npm install wavefile
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
const model = 'gemini-2.0-flash-live-001';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
    const responseQueue = [];

    async function waitMessage() {
        let done = false;
        let message = undefined;
        while (!done) {
            message = responseQueue.shift();
            if (message) {
                done = true;
            } else {
                await new Promise((resolve) => setTimeout(resolve, 100));
            }
        }
        return message;
    }

    async function handleTurn() {
        const turns = [];
        let done = false;
        while (!done) {
            const message = await waitMessage();
            turns.push(message);
            if (message.serverContent && message.serverContent.turnComplete) {
                done = true;
            }
        }
        return turns;
    }

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                responseQueue.push(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    // Send Audio Chunk
    const fileBuffer = fs.readFileSync("sample.wav");

    // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
    const wav = new WaveFile();
    wav.fromBuffer(fileBuffer);
    wav.toSampleRate(16000);
    wav.toBitDepth("16");
    const base64Audio = wav.toBase64();

    // If already in correct format, you can use this:
    // const fileBuffer = fs.readFileSync("sample.pcm");
    // const base64Audio = Buffer.from(fileBuffer).toString('base64');

    session.sendRealtimeInput(
        {
            audio: {
                data: base64Audio,
                mimeType: "audio/pcm;rate=16000"
            }
        }

    );

    const turns = await handleTurn();
    for (const turn of turns) {
        if (turn.text) {
            console.debug('Received text: %s\n', turn.text);
        }
        else if (turn.data) {
            console.debug('Received inline data: %s\n', turn.data);
        }
    }

    session.close();
}

async function main() {
    await live().catch((e) => console.error('got error', e));
}

main();

Bạn có thể nhận âm thanh bằng cách đặt AUDIO làm phương thức phản hồi. Ví dụ này lưu dữ liệu đã nhận dưới dạng tệp WAV:

PythonJavaScript

import asyncio
import wave
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["AUDIO"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)

        message = "Hello how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.data is not None:
                wf.writeframes(response.data)

            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)

        wf.close()

if __name__ == "__main__":
    asyncio.run(main())

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
const model = 'gemini-2.0-flash-live-001';
const config = { responseModalities: [Modality.AUDIO] };

async function live() {
    const responseQueue = [];

    async function waitMessage() {
        let done = false;
        let message = undefined;
        while (!done) {
            message = responseQueue.shift();
            if (message) {
                done = true;
            } else {
                await new Promise((resolve) => setTimeout(resolve, 100));
            }
        }
        return message;
    }

    async function handleTurn() {
        const turns = [];
        let done = false;
        while (!done) {
            const message = await waitMessage();
            turns.push(message);
            if (message.serverContent && message.serverContent.turnComplete) {
                done = true;
            }
        }
        return turns;
    }

    const session = await ai.live.connect({
        model: model,
        callbacks: {
            onopen: function () {
                console.debug('Opened');
            },
            onmessage: function (message) {
                responseQueue.push(message);
            },
            onerror: function (e) {
                console.debug('Error:', e.message);
            },
            onclose: function (e) {
                console.debug('Close:', e.reason);
            },
        },
        config: config,
    });

    const simple = 'Hello how are you?';
    session.sendClientContent({ turns: simple });

    const turns = await handleTurn();

    // Combine audio data strings and save as wave file
    const combinedAudio = turns.reduce((acc, turn) => {
        if (turn.data) {
            const buffer = Buffer.from(turn.data, 'base64');
            const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
            return acc.concat(Array.from(intArray));
        }
        return acc;
    }, []);

    const audioBuffer = new Int16Array(combinedAudio);

    const wf = new WaveFile();
    wf.fromScratch(1, 24000, '16', audioBuffer);
    fs.writeFileSync('output.wav', wf.toBuffer());

    session.close();
}

async function main() {
    await live().catch((e) => console.error('got error', e));
}

main();

Định dạng âm thanh

Dữ liệu âm thanh trong Live API luôn ở dạng thô, little-endian, PCM 16 bit. Đầu ra âm thanh luôn sử dụng tốc độ lấy mẫu là 24 kHz. Âm thanh đầu vào ban đầu là 16 kHz, nhưng Live API sẽ lấy mẫu lại nếu cần để có thể gửi bất kỳ tốc độ lấy mẫu nào. Để truyền tải tốc độ lấy mẫu của âm thanh đầu vào, hãy đặt loại MIME của mỗi Blob chứa âm thanh thành một giá trị như audio/pcm;rate=16000.

Nhận bản chép lời âm thanh

Bạn có thể bật tính năng chép lời đầu ra âm thanh của mô hình bằng cách gửi output_audio_transcription trong cấu hình thiết lập. Ngôn ngữ bản chép lời được suy luận từ phản hồi của mô hình.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["AUDIO"],
          "output_audio_transcription": {}
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"

        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.output_transcription:
                print("Transcript:", response.server_content.output_transcription.text)


if __name__ == "__main__":
    asyncio.run(main())

Bạn có thể bật tính năng chép lời cho dữ liệu đầu vào âm thanh bằng cách gửi input_audio_transcription trong cấu hình thiết lập.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["TEXT"],
    "realtime_input_config": {
        "automatic_activity_detection": {"disabled": True},
        "activity_handling": "NO_INTERRUPTION",
    },
    "input_audio_transcription": {},
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_data = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(activity_start=types.ActivityStart())
        await session.send_realtime_input(
            audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
        )
        await session.send_realtime_input(activity_end=types.ActivityEnd())

        async for msg in session.receive():
            if msg.server_content.input_transcription:
                print('Transcript:', msg.server_content.input_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

Truyền trực tuyến âm thanh và video

Hướng dẫn về hệ thống

Hướng dẫn hệ thống cho phép bạn điều hướng hành vi của một mô hình dựa trên các nhu cầu và trường hợp sử dụng cụ thể của bạn. Bạn có thể đặt hướng dẫn hệ thống trong cấu hình thiết lập và hướng dẫn này sẽ có hiệu lực trong toàn bộ phiên.

from google.genai import types

config = {
    "system_instruction": types.Content(
        parts=[
            types.Part(
                text="You are a helpful assistant and answer in a friendly tone."
            )
        ]
    ),
    "response_modalities": ["TEXT"],
}

Cập nhật nội dung tăng dần

Sử dụng các bản cập nhật gia tăng để gửi dữ liệu nhập văn bản, thiết lập ngữ cảnh phiên hoặc khôi phục ngữ cảnh phiên. Đối với ngữ cảnh ngắn, bạn có thể gửi các lượt tương tác từng bước để thể hiện trình tự sự kiện chính xác:

PythonJSON

turns = [
    {"role": "user", "parts": [{"text": "What is the capital of France?"}]},
    {"role": "model", "parts": [{"text": "Paris"}]},
]

await session.send_client_content(turns=turns, turn_complete=False)

turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]

await session.send_client_content(turns=turns, turn_complete=True)

{
  "clientContent": {
    "turns": [
      {
        "parts":[
          {
            "text": ""
          }
        ],
        "role":"user"
      },
      {
        "parts":[
          {
            "text": ""
          }
        ],
        "role":"model"
      }
    ],
    "turnComplete": true
  }
}

Đối với ngữ cảnh dài hơn, bạn nên cung cấp một bản tóm tắt thông báo duy nhất để giải phóng cửa sổ ngữ cảnh cho các lượt tương tác tiếp theo.

Thay đổi giọng nói và ngôn ngữ

API Trực tiếp hỗ trợ các giọng nói sau: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus và Zephyr.

Để chỉ định giọng nói, hãy đặt tên giọng nói trong đối tượng speechConfig như một phần của cấu hình phiên:

PythonJSON

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    speech_config=types.SpeechConfig(
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
        )
    )
)

{
  "voiceConfig": {
    "prebuiltVoiceConfig": {
      "voiceName": "Kore"
    }
  }
}

API Trực tiếp hỗ trợ nhiều ngôn ngữ.

Để thay đổi ngôn ngữ, hãy đặt mã ngôn ngữ trong đối tượng speechConfig là một phần của cấu hình phiên:

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    speech_config=types.SpeechConfig(
        language_code="de-DE",
    )
)

Đầu ra âm thanh gốc

Thông qua API trực tiếp, bạn cũng có thể truy cập vào các mô hình cho phép đầu ra âm thanh gốc ngoài đầu vào âm thanh gốc. Điều này cho phép đầu ra âm thanh chất lượng cao hơn với tốc độ, giọng nói tự nhiên, độ chi tiết và tâm trạng tốt hơn.

Các mô hình âm thanh gốc sau đây hỗ trợ đầu ra âm thanh gốc:

gemini-2.5-flash-preview-native-audio-dialog
gemini-2.5-flash-exp-native-audio-thinking-dialog

Cách sử dụng đầu ra âm thanh gốc

Để sử dụng đầu ra âm thanh gốc, hãy định cấu hình một trong các mô hình âm thanh gốc và đặt response_modalities thành AUDIO.

Hãy xem phần Gửi và nhận âm thanh để biết ví dụ đầy đủ.

PythonJavaScript

model = "gemini-2.5-flash-preview-native-audio-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])

async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio

const model = 'gemini-2.5-flash-preview-native-audio-dialog';
const config = { responseModalities: [Modality.AUDIO] };

async function main() {

    const session = await ai.live.connect({
        model: model,
        config: config,
        callbacks: ...,
    });

    // Send audio input and receive audio

    session.close();
}

main();

Hộp thoại cảm xúc

Tính năng này cho phép Gemini điều chỉnh phong cách phản hồi cho phù hợp với biểu cảm và giọng điệu đầu vào.

Để sử dụng hộp thoại cảm xúc, hãy đặt phiên bản API thành v1alpha và đặt enable_affective_dialog thành true trong thông báo thiết lập:

PythonJavaScript

client = genai.Client(api_key="GOOGLE_API_KEY", http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True
)

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY", httpOptions: {"apiVersion": "v1alpha"} });

const config = {
    responseModalities: [Modality.AUDIO],
    enableAffectiveDialog: true
};

Xin lưu ý rằng hộp thoại cảm xúc hiện chỉ được hỗ trợ bởi các mô hình đầu ra âm thanh gốc.

Âm thanh chủ động

Khi tính năng này được bật, Gemini có thể chủ động quyết định không trả lời nếu nội dung không liên quan.

Để sử dụng, hãy đặt phiên bản API thành v1alpha và định cấu hình trường proactivity trong thông báo thiết lập, đồng thời đặt proactive_audio thành true:

PythonJavaScript

client = genai.Client(api_key="GOOGLE_API_KEY", http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity={'proactive_audio': True}
)

const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY", httpOptions: {"apiVersion": "v1alpha"} });

const config = {
    responseModalities: [Modality.AUDIO],
    proactivity: { proactiveAudio: true }
}

Xin lưu ý rằng âm thanh chủ động hiện chỉ được các mô hình đầu ra âm thanh gốc hỗ trợ.

Đầu ra âm thanh gốc có suy nghĩ

Đầu ra âm thanh gốc hỗ trợ các chức năng tư duy, có sẵn thông qua một mô hình riêng biệt gemini-2.5-flash-exp-native-audio-thinking-dialog.

Hãy xem phần Gửi và nhận âm thanh để biết ví dụ đầy đủ.

Python JavaScript

model = "gemini-2.5-flash-exp-native-audio-thinking-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])

async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio

const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog';
const config = { responseModalities: [Modality.AUDIO] };

async function main() {

    const session = await ai.live.connect({
        model: model,
        config: config,
        callbacks: ...,
    });

    // Send audio input and receive audio

    session.close();
}

main();

Sử dụng công cụ với Live API

Bạn có thể xác định các công cụ như Gọi hàm, Thực thi mã và Google Tìm kiếm bằng Live API.

Tổng quan về các công cụ được hỗ trợ

Dưới đây là thông tin tổng quan ngắn gọn về các công cụ có sẵn cho từng mô hình:

Công cụ	Mô hình xếp chồng `gemini-2.0-flash-live-001`	`gemini-2.5-flash-preview-native-audio-dialog`	`gemini-2.5-flash-exp-native-audio-thinking-dialog`
Tìm kiếm	Có	Có	Có
Gọi hàm	Có	Có	Không
Thực thi mã	Có	Không	Không
Ngữ cảnh URL	Có	Không	Không

Gọi hàm

Bạn có thể xác định phần khai báo hàm trong cấu hình phiên. Hãy xem Hướng dẫn gọi hàm để tìm hiểu thêm.

Sau khi nhận được lệnh gọi công cụ, ứng dụng khách sẽ phản hồi bằng một danh sách các đối tượng FunctionResponse bằng phương thức session.send_tool_response.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

# Simple function definitions
turn_on_the_lights = {"name": "turn_on_the_lights"}
turn_off_the_lights = {"name": "turn_off_the_lights"}

tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Turn on the lights please"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            elif chunk.tool_call:
                function_responses = []
                for fc in tool_call.function_calls:
                    function_response = types.FunctionResponse(
                        id=fc.id,
                        name=fc.name,
                        response={ "result": "ok" } # simple, hard-coded function response
                    )
                    function_responses.append(function_response)

                await session.send_tool_response(function_responses=function_responses)


if __name__ == "__main__":
    asyncio.run(main())

Từ một câu lệnh duy nhất, mô hình có thể tạo nhiều lệnh gọi hàm và mã cần thiết để tạo chuỗi đầu ra. Mã này thực thi trong môi trường hộp cát, tạo các thông báo BidiGenerateContentToolCall tiếp theo.

Lệnh gọi hàm không đồng bộ

Theo mặc định, quá trình thực thi sẽ tạm dừng cho đến khi có kết quả của từng lệnh gọi hàm, đảm bảo quá trình xử lý tuần tự. Điều này có nghĩa là bạn sẽ không thể tiếp tục tương tác với mô hình trong khi các hàm đang chạy.

Nếu không muốn chặn cuộc trò chuyện, bạn có thể yêu cầu mô hình chạy các hàm không đồng bộ.

Để làm như vậy, trước tiên, bạn cần thêm behavior vào định nghĩa hàm:

  # Non-blocking function definitions
  turn_on_the_lights = {"name": "turn_on_the_lights", "behavior": "NON_BLOCKING"} # turn_on_the_lights will run asynchronously
  turn_off_the_lights = {"name": "turn_off_the_lights"} # turn_off_the_lights will still pause all interactions with the model

NON-BLOCKING sẽ đảm bảo hàm chạy không đồng bộ trong khi bạn có thể tiếp tục tương tác với mô hình.

Sau đó, bạn cần cho mô hình biết cách hoạt động khi nhận được FunctionResponse bằng tham số scheduling. Công cụ này có thể:

Tạm dừng thao tác đang thực hiện và thông báo cho bạn về phản hồi nhận được ngay lập tức (scheduling="INTERRUPT"),
Chờ đến khi quá trình này hoàn tất việc đang thực hiện (scheduling="WHEN_IDLE"),
Hoặc không làm gì cả và sử dụng kiến thức đó sau trong cuộc thảo luận (scheduling="SILENT")

# Non-blocking function definitions
  function_response = types.FunctionResponse(
      id=fc.id,
      name=fc.name,
      response={
          "result": "ok",
          "scheduling": "INTERRUPT" # Can also be WHEN_IDLE or SILENT
      }
  )

Thực thi mã

Bạn có thể xác định quá trình thực thi mã trong cấu hình phiên. Hãy xem Hướng dẫn thực thi mã để tìm hiểu thêm.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

tools = [{'code_execution': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Compute the largest prime palindrome under 100000."
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                      if part.executable_code is not None:
                        print(part.executable_code.code)

                      if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())

Tìm thông tin cơ bản trên Google Tìm kiếm

Bạn có thể bật tính năng Làm quen với Google Tìm kiếm trong cấu hình phiên. Hãy xem Hướng dẫn về việc nối đất để tìm hiểu thêm.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

tools = [{'google_search': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "When did the last Brazil vs. Argentina soccer match happen?"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)

                # The model might generate and execute Python code to use Search
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                      if part.executable_code is not None:
                        print(part.executable_code.code)

                      if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())

Kết hợp nhiều công cụ

Bạn có thể kết hợp nhiều công cụ trong API Trực tiếp:

prompt = """
Hey, I need you to do three things for me.

1. Compute the largest prime palindrome under 100000.
2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024?
3. Turn on the lights

Thanks!
"""

tools = [
    {"google_search": {}},
    {"code_execution": {}},
    {"function_declarations": [turn_on_the_lights, turn_off_the_lights]},
]

config = {"response_modalities": ["TEXT"], "tools": tools}

Xử lý các hoạt động gián đoạn

Người dùng có thể làm gián đoạn đầu ra của mô hình bất cứ lúc nào. Khi tính năng Phát hiện hoạt động giọng nói (VAD) phát hiện thấy một sự gián đoạn, quá trình tạo đang diễn ra sẽ bị huỷ và bị loại bỏ. Chỉ thông tin đã gửi đến ứng dụng mới được giữ lại trong nhật ký phiên. Sau đó, máy chủ sẽ gửi thông báo BidiGenerateContentServerContent để báo cáo sự gián đoạn.

Ngoài ra, máy chủ Gemini sẽ loại bỏ mọi lệnh gọi hàm đang chờ xử lý và gửi một thông báo BidiGenerateContentServerContent có mã nhận dạng của các lệnh gọi đã bị huỷ.

async for response in session.receive():
    if response.server_content.interrupted is True:
        # The generation was interrupted

Phát hiện hoạt động giọng nói (VAD)

Bạn có thể định cấu hình hoặc tắt tính năng phát hiện hoạt động giọng nói (VAD).

Sử dụng tính năng VAD tự động

Theo mặc định, mô hình sẽ tự động thực hiện VAD trên một luồng đầu vào âm thanh liên tục. Bạn có thể định cấu hình VAD bằng trường realtimeInputConfig.automaticActivityDetection của cấu hình thiết lập.

Khi luồng âm thanh bị tạm dừng hơn một giây (ví dụ: vì người dùng đã tắt micrô), bạn nên gửi một sự kiện audioStreamEnd để xoá mọi âm thanh đã lưu vào bộ nhớ đệm. Ứng dụng có thể tiếp tục gửi dữ liệu âm thanh bất cứ lúc nào.

# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        # if stream gets paused, send:
        # await session.send_realtime_input(audio_stream_end=True)

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

Với send_realtime_input, API sẽ tự động phản hồi âm thanh dựa trên VAD. Mặc dù send_client_content thêm thông báo vào ngữ cảnh mô hình theo thứ tự, nhưng send_realtime_input được tối ưu hoá để phản hồi nhanh, nhưng phải trả giá bằng việc sắp xếp có tính chất xác định.

Định cấu hình VAD tự động

Để kiểm soát hoạt động VAD tốt hơn, bạn có thể định cấu hình các thông số sau. Hãy xem tài liệu tham khảo API để biết thêm thông tin.

from google.genai import types

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {
        "automatic_activity_detection": {
            "disabled": False, # default
            "start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
            "end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
            "prefix_padding_ms": 20,
            "silence_duration_ms": 100,
        }
    }
}

Tắt tính năng VAD tự động

Ngoài ra, bạn có thể tắt tính năng VAD tự động bằng cách đặt realtimeInputConfig.automaticActivityDetection.disabled thành true trong thông báo thiết lập. Trong cấu hình này, ứng dụng chịu trách nhiệm phát hiện lời nói của người dùng và gửi thông báo activityStart và activityEnd vào thời điểm thích hợp. audioStreamEnd không được gửi trong cấu hình này. Thay vào đó, mọi sự gián đoạn của luồng sẽ được đánh dấu bằng thông báo activityEnd.

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}

async with client.aio.live.connect(model=model, config=config) as session:
    # ...
    await session.send_realtime_input(activity_start=types.ActivityStart())
    await session.send_realtime_input(
        audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
    )
    await session.send_realtime_input(activity_end=types.ActivityEnd())
    # ...

Số lượng mã thông báo

Bạn có thể tìm thấy tổng số mã thông báo đã sử dụng trong trường usageMetadata của thông báo máy chủ được trả về.

async for message in session.receive():
    # The server will periodically send messages that include UsageMetadata.
    if message.usage_metadata:
        usage = message.usage_metadata
        print(
            f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
        )
        for detail in usage.response_tokens_details:
            match detail:
                case types.ModalityTokenCount(modality=modality, token_count=count):
                    print(f"{modality}: {count}")

Kéo dài thời lượng phiên

Bạn có thể mở rộng thời lượng phiên tối đa lên không giới hạn bằng hai cơ chế:

Nén cửa sổ ngữ cảnh
Tiếp tục phiên hoạt động

Ngoài ra, bạn sẽ nhận được một thông báo GoAway trước khi phiên kết thúc, cho phép bạn thực hiện các hành động khác.

Nén cửa sổ ngữ cảnh

Để bật các phiên dài hơn và tránh bị ngắt kết nối đột ngột, bạn có thể bật tính năng nén cửa sổ ngữ cảnh bằng cách đặt trường contextWindowCompression trong cấu hình phiên.

Trong ContextWindowCompressionConfig, bạn có thể định cấu hình cơ chế cửa sổ trượt và số lượng mã thông báo kích hoạt quá trình nén.

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    context_window_compression=(
        # Configures compression with default parameters.
        types.ContextWindowCompressionConfig(
            sliding_window=types.SlidingWindow(),
        )
    ),
)

Tiếp tục phiên

Để ngăn phiên bị chấm dứt khi máy chủ định kỳ đặt lại kết nối WebSocket, hãy định cấu hình trường sessionResumption trong cấu hình thiết lập.

Việc truyền cấu hình này sẽ khiến máy chủ gửi thông báo SessionResumptionUpdate. Bạn có thể dùng thông báo này để tiếp tục phiên bằng cách truyền mã thông báo tiếp tục gần đây nhất dưới dạng SessionResumptionConfig.handle của kết nối tiếp theo.

import asyncio
from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

async def main():
    print(f"Connecting to the service with handle {previous_session_handle}...")
    async with client.aio.live.connect(
        model=model,
        config=types.LiveConnectConfig(
            response_modalities=["AUDIO"],
            session_resumption=types.SessionResumptionConfig(
                # The handle of the session to resume is passed here,
                # or else None to start a new session.
                handle=previous_session_handle
            ),
        ),
    ) as session:
        while True:
            await session.send_client_content(
                turns=types.Content(
                    role="user", parts=[types.Part(text="Hello world!")]
                )
            )
            async for message in session.receive():
                # Periodically, the server will send update messages that may
                # contain a handle for the current state of the session.
                if message.session_resumption_update:
                    update = message.session_resumption_update
                    if update.resumable and update.new_handle:
                        # The handle should be retained and linked to the session.
                        return update.new_handle

                # For the purposes of this example, placeholder input is continually fed
                # to the model. In non-sample code, the model inputs would come from
                # the user.
                if message.server_content and message.server_content.turn_complete:
                    break

if __name__ == "__main__":
    asyncio.run(main())

Nhận thông báo trước khi phiên ngắt kết nối

Máy chủ sẽ gửi một thông báo GoAway để báo hiệu rằng kết nối hiện tại sẽ sớm bị chấm dứt. Thông báo này bao gồm timeLeft, cho biết thời gian còn lại và cho phép bạn thực hiện hành động khác trước khi kết nối bị chấm dứt với trạng thái ABORTED (BỊ HUỶ).

async for response in session.receive():
    if response.go_away is not None:
        # The connection will soon be terminated
        print(response.go_away.time_left)

Nhận thông báo khi quá trình tạo hoàn tất

Máy chủ sẽ gửi thông báo generationComplete để báo hiệu rằng mô hình đã hoàn tất việc tạo phản hồi.

async for response in session.receive():
    if response.server_content.generation_complete is True:
        # The generation is complete

Độ phân giải nội dung nghe nhìn

Bạn có thể chỉ định độ phân giải nội dung nghe nhìn cho nội dung nghe nhìn đầu vào bằng cách đặt trường mediaResolution trong cấu hình phiên:

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    media_resolution=types.MediaResolution.MEDIA_RESOLUTION_LOW,
)

Các điểm hạn chế

Hãy cân nhắc các giới hạn sau đây của Live API khi bạn lên kế hoạch cho dự án.

Phương thức phản hồi

Bạn chỉ có thể đặt một phương thức phản hồi (TEXT hoặc AUDIO) cho mỗi phiên trong cấu hình phiên. Việc thiết lập cả hai sẽ dẫn đến thông báo lỗi cấu hình. Điều này có nghĩa là bạn có thể định cấu hình mô hình để phản hồi bằng văn bản hoặc âm thanh, nhưng không thể phản hồi bằng cả hai trong cùng một phiên.

Xác thực ứng dụng

API trực tiếp chỉ cung cấp tính năng xác thực máy chủ với máy chủ và không nên dùng cho ứng dụng khách trực tiếp. Dữ liệu đầu vào của ứng dụng khách phải được định tuyến thông qua một máy chủ ứng dụng trung gian để xác thực an toàn bằng API Trực tiếp.

Thời lượng phiên

Bạn có thể kéo dài thời lượng phiên đến vô hạn bằng cách bật tính năng nén phiên. Nếu không nén, các phiên chỉ âm thanh sẽ bị giới hạn ở 15 phút và các phiên âm thanh cùng video sẽ bị giới hạn ở 2 phút. Nếu vượt quá các giới hạn này mà không nén, kết nối sẽ bị chấm dứt.

Ngoài ra, bạn có thể định cấu hình tính năng tiếp tục phiên để cho phép ứng dụng tiếp tục một phiên đã bị chấm dứt.

Cửa sổ ngữ cảnh

Một phiên có giới hạn cửa sổ ngữ cảnh là:

128k mã thông báo cho các mô hình đầu ra âm thanh gốc
32 nghìn mã thông báo cho các mô hình API trực tiếp khác

Ngôn ngữ được hỗ trợ

Live API hỗ trợ các ngôn ngữ sau.

Ngôn ngữ	Mã BCP-47
Tiếng Đức (Đức)	de-DE
Tiếng Anh (Úc)	en-AU
Tiếng Anh (Anh)	en-GB
Tiếng Anh (Ấn Độ)	en-IN
Tiếng Anh (Mỹ)	en-US
Tiếng Tây Ban Nha (Mỹ)	es-US
Tiếng Pháp (Pháp)	fr-FR
Tiếng Hindi (Ấn Độ)	hi-IN
Tiếng Bồ Đào Nha (Brazil)	pt-BR
Tiếng Ả Rập (Chung)	ar-XA
Tiếng Tây Ban Nha (Tây Ban Nha)	es-ES
Tiếng Pháp (Canada)	fr-CA
Tiếng Indo (Indonesia)	id-ID
Tiếng Ý (Ý)	it-IT
Tiếng Nhật (Nhật Bản)	ja-JP
Tiếng Thổ Nhĩ Kỳ (Thổ Nhĩ Kỳ)	tr-TR
Tiếng Việt (Việt Nam)	vi-VN
Tiếng Bengal (Ấn Độ)	bn-IN
Tiếng Gujarati (Ấn Độ)	gu-IN
Tiếng Kannada (Ấn Độ)	kn-IN
Tiếng Malayalam (Ấn Độ)	ml-IN
Tiếng Marathi (Ấn Độ)	mr-IN
Tiếng Tamil (Ấn Độ)	ta-IN
Tiếng Telugu (Ấn Độ)	te-IN
Tiếng Hà Lan (Hà Lan)	nl-NL
Tiếng Hàn (Hàn Quốc)	ko-KR
Tiếng Trung (Quan thoại) (Trung Quốc)	cmn-CN
Tiếng Ba Lan (Ba Lan)	pl-PL
Tiếng Nga (Nga)	ru-RU
Tiếng Thái (Thái Lan)	th-TH

Các ứng dụng tích hợp của bên thứ ba

Đối với việc triển khai ứng dụng web và ứng dụng di động, bạn có thể khám phá các lựa chọn trong:

Bước tiếp theo

Dùng thử API trực tiếp trong Google AI Studio.
Để biết thêm thông tin về Gemini 2.0 Flash Live, hãy xem trang mô hình.
Hãy thử thêm các ví dụ trong cuốn sách công thức về Live API, cuốn sách công thức về Công cụ Live API và tập lệnh Bắt đầu sử dụng Live API.