Gemini 3 Flash が新登場Google AI Studio で無料でお試しください。

このページは Cloud Translation API によって翻訳されました。

Live API capabilities guide

これは、Live API で利用可能な機能と構成について説明する包括的なガイドです。概要と一般的なユースケースのサンプルコードについては、Live API のスタートガイドをご覧ください。

始める前に

コアコンセプトを理解する: まだ読んでいない場合は、まず Live API を使ってみるページをご覧ください。ここでは、Live API の基本原則、仕組み、さまざまな実装方法について説明します。
AI Studio で Live API を試す: 構築を開始する前に、Google AI Studio で Live API を試してみることをおすすめします。Google AI Studio で Live API を使用するには、[ストリーム] を選択します。

接続を確立する

次の例は、API キーを使用して接続を作成する方法を示しています。

Python

import asyncio
from google import genai

client = genai.Client()

model = "gemini-2.5-flash-native-audio-preview-12-2025"
config = {"response_modalities": ["AUDIO"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started")
        # Send content...

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-2.5-flash-native-audio-preview-12-2025';
const config = { responseModalities: [Modality.AUDIO] };

async function main() {

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        console.debug(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  console.debug("Session started");
  // Send content...

  session.close();
}

main();

インタラクションモダリティ

以降のセクションでは、Live API で使用できるさまざまな入力モードと出力モードの例と、それらをサポートするコンテキストについて説明します。

音声の送受信

最も一般的な音声の例である音声から音声への変換については、スタートガイドをご覧ください。

オーディオ形式

Live API の音声データは常に RAW、リトルエンディアン、16 ビット PCM です。オーディオ出力は常に 24 kHz のサンプルレートを使用します。入力音声はネイティブで 16 kHz ですが、Live API は必要に応じてリサンプリングするため、任意のサンプルレートを送信できます。入力音声のサンプルレートを伝えるには、音声を含む各 Blob の MIME タイプを audio/pcm;rate=16000 などの値に設定します。

テキストを送信しています

テキストを送信する方法は次のとおりです。

Python

message = "Hello, how are you?"
await session.send_client_content(turns=message, turn_complete=True)

JavaScript

const message = 'Hello, how are you?';
session.sendClientContent({ turns: message, turnComplete: true });

コンテンツの増分更新

増分更新を使用して、テキスト入力の送信、セッションコンテキストの確立、セッションコンテキストの復元を行います。コンテキストが短い場合は、ターンバイターンのインタラクションを送信して、イベントの正確なシーケンスを表すことができます。

Python

turns = [
    {"role": "user", "parts": [{"text": "What is the capital of France?"}]},
    {"role": "model", "parts": [{"text": "Paris"}]},
]

await session.send_client_content(turns=turns, turn_complete=False)

turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]

await session.send_client_content(turns=turns, turn_complete=True)

JavaScript

let inputTurns = [
  { "role": "user", "parts": [{ "text": "What is the capital of France?" }] },
  { "role": "model", "parts": [{ "text": "Paris" }] },
]

session.sendClientContent({ turns: inputTurns, turnComplete: false })

inputTurns = [{ "role": "user", "parts": [{ "text": "What is the capital of Germany?" }] }]

session.sendClientContent({ turns: inputTurns, turnComplete: true })

コンテキストが長い場合は、1 つのメッセージの概要を提供して、後続のインタラクション用にコンテキストウィンドウを空けておくことをおすすめします。セッションコンテキストを読み込む別の方法については、セッションの再開をご覧ください。

音声文字起こし

モデルのレスポンスに加えて、音声出力と音声入力の両方の文字起こしを受け取ることもできます。

モデルの音声出力の文字起こしを有効にするには、セットアップ構成で output_audio_transcription を送信します。文字起こし言語は、モデルのレスポンスから推測されます。

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-2.5-flash-native-audio-preview-12-2025"

config = {
    "response_modalities": ["AUDIO"],
    "output_audio_transcription": {}
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"

        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.output_transcription:
                print("Transcript:", response.server_content.output_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-2.5-flash-native-audio-preview-12-2025';

const config = {
  responseModalities: [Modality.AUDIO],
  outputAudioTranscription: {}
};

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });

  const turns = await handleTurn();

  for (const turn of turns) {
    if (turn.serverContent && turn.serverContent.outputTranscription) {
      console.debug('Received output transcription: %s\n', turn.serverContent.outputTranscription.text);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

モデルの音声入力の文字起こしを有効にするには、セットアップ構成で input_audio_transcription を送信します。

Python

import asyncio
from pathlib import Path
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-2.5-flash-native-audio-preview-12-2025"

config = {
    "response_modalities": ["AUDIO"],
    "input_audio_transcription": {},
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_data = Path("16000.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
        )

        async for msg in session.receive():
            if msg.server_content.input_transcription:
                print('Transcript:', msg.server_content.input_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;

const ai = new GoogleGenAI({});
const model = 'gemini-2.5-flash-native-audio-preview-12-2025';

const config = {
  responseModalities: [Modality.AUDIO],
  inputAudioTranscription: {}
};

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("16000.wav");

  // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
  const wav = new WaveFile();
  wav.fromBuffer(fileBuffer);
  wav.toSampleRate(16000);
  wav.toBitDepth("16");
  const base64Audio = wav.toBase64();

  // If already in correct format, you can use this:
  // const fileBuffer = fs.readFileSync("sample.pcm");
  // const base64Audio = Buffer.from(fileBuffer).toString('base64');

  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }
  );

  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
    else if (turn.serverContent && turn.serverContent.inputTranscription) {
      console.debug('Received input transcription: %s\n', turn.serverContent.inputTranscription.text);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

音声と動画をストリーミングする

音声と言語を変更する

ネイティブ音声出力モデルは、テキスト読み上げ（TTS）モデルで利用可能な音声のいずれかをサポートします。AI Studio で、すべての音声を聞くことができます。

音声を指定するには、セッション構成の一部として、speechConfig オブジェクト内に音声名を設定します。

Python

config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
    },
}

JavaScript

const config = {
  responseModalities: [Modality.AUDIO],
  speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } } }
};

Live API は複数の言語をサポートしています。ネイティブ音声出力モデルは、適切な言語を自動的に選択し、言語コードの明示的な設定をサポートしていません。

ネイティブオーディオ機能

最新のモデルにはネイティブ音声出力が搭載されており、自然でリアルな音声と多言語対応のパフォーマンスの向上を実現しています。ネイティブ音声では、感情認識型ダイアログ、プロアクティブ音声（モデルが入力への応答のタイミングをインテリジェントに判断する）、「思考」などの高度な機能も利用できます。

アフェクティブダイアログ

この機能により、Gemini は入力された表現と口調に応じて回答スタイルを調整できます。

アフェクティブダイアログを使用するには、セットアップメッセージで API バージョンを v1alpha に設定し、enable_affective_dialog を true に設定します。

Python

client = genai.Client(http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True
)

JavaScript

const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });

const config = {
  responseModalities: [Modality.AUDIO],
  enableAffectiveDialog: true
};

コンテキストに応じた音声にのみ対応

この機能を有効にすると、コンテンツが関連性のない場合、Gemini は応答しないことを事前に判断できます。

これを使用するには、API バージョンを v1alpha に設定し、セットアップメッセージの proactivity フィールドを構成して、proactive_audio を true に設定します。

Python

client = genai.Client(http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity={'proactive_audio': True}
)

JavaScript

const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });

const config = {
  responseModalities: [Modality.AUDIO],
  proactivity: { proactiveAudio: true }
}

思考モード

最新のネイティブ音声出力モデル gemini-2.5-flash-native-audio-preview-12-2025 は、思考能力をサポートしており、動的思考がデフォルトで有効になっています。

thinkingBudget パラメータは、回答の生成時に使用する思考トークンの数に関するガイダンスをモデルに提供します。thinkingBudget を 0 に設定すると、思考を無効にできます。モデルの thinkingBudget 構成の詳細については、思考予算のドキュメントをご覧ください。

Python

model = "gemini-2.5-flash-native-audio-preview-12-2025"

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
    )
)

async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio

JavaScript

const model = 'gemini-2.5-flash-native-audio-preview-12-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
  },
};

async function main() {

  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: ...,
  });

  // Send audio input and receive audio

  session.close();
}

main();

また、構成で includeThoughts を true に設定すると、思考の要約を有効にできます。詳しくは、思考の要約をご覧ください。

Python

model = "gemini-2.5-flash-native-audio-preview-12-2025"

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
        include_thoughts=True
    )
)

JavaScript

const model = 'gemini-2.5-flash-native-audio-preview-12-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
    includeThoughts: true,
  },
};

音声アクティビティ検出（VAD）

音声アクティビティ検出（VAD）により、モデルは人が話しているときを認識できます。これは、ユーザーがいつでもモデルを中断できるようにするため、自然な会話を作成するうえで不可欠です。

VAD が中断を検出すると、進行中の生成はキャンセルされ、破棄されます。クライアントにすでに送信された情報だけがセッション履歴に保持されます。その後、サーバーは中断を報告する BidiGenerateContentServerContent メッセージを送信します。

Gemini サーバーは、保留中の関数呼び出しを破棄し、キャンセルされた呼び出しの ID を記載した BidiGenerateContentServerContent メッセージを送信します。

Python

async for response in session.receive():
    if response.server_content.interrupted is True:
        # The generation was interrupted

        # If realtime playback is implemented in your application,
        # you should stop playing audio and clear queued playback here.

JavaScript

const turns = await handleTurn();

for (const turn of turns) {
  if (turn.serverContent && turn.serverContent.interrupted) {
    // The generation was interrupted

    // If realtime playback is implemented in your application,
    // you should stop playing audio and clear queued playback here.
  }
}

自動 VAD

デフォルトでは、モデルは連続した音声入力ストリームに対して VAD を自動的に実行します。VAD は、セットアップ構成の realtimeInputConfig.automaticActivityDetection フィールドで構成できます。

音声ストリームが 1 秒以上一時停止すると（たとえば、ユーザーがマイクをオフにした場合）、キャッシュに保存された音声をフラッシュするために audioStreamEnd イベントが送信される必要があります。クライアントはいつでも音声データの送信を再開できます。

Python

# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        # if stream gets paused, send:
        # await session.send_realtime_input(audio_stream_end=True)

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

// example audio file to try:
// URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
// !wget -q $URL -O sample.pcm
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("sample.pcm");
  const base64Audio = Buffer.from(fileBuffer).toString('base64');

  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }

  );

  // if stream gets paused, send:
  // session.sendRealtimeInput({ audioStreamEnd: true })

  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

send_realtime_input を使用すると、API は VAD に基づいて音声に自動的に応答します。send_client_content はメッセージをモデルコンテキストに順序どおりに追加しますが、send_realtime_input は決定論的な順序を犠牲にして応答性を最適化します。

VAD の自動構成

VAD アクティビティをより詳細に制御するには、次のパラメータを構成します。詳細については、API リファレンスをご覧ください。

Python

from google.genai import types

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {
        "automatic_activity_detection": {
            "disabled": False, # default
            "start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
            "end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
            "prefix_padding_ms": 20,
            "silence_duration_ms": 100,
        }
    }
}

JavaScript

import { GoogleGenAI, Modality, StartSensitivity, EndSensitivity } from '@google/genai';

const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: false, // default
      startOfSpeechSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
      endOfSpeechSensitivity: EndSensitivity.END_SENSITIVITY_LOW,
      prefixPaddingMs: 20,
      silenceDurationMs: 100,
    }
  }
};

自動 VAD を無効にする

また、セットアップメッセージで realtimeInputConfig.automaticActivityDetection.disabled を true に設定することで、自動 VAD を無効にすることもできます。この構成では、クライアントがユーザーの音声の検出と、適切なタイミングでの activityStart メッセージと activityEnd メッセージの送信を行います。この構成では audioStreamEnd は送信されません。代わりに、ストリームの中断は activityEnd メッセージでマークされます。

Python

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}

async with client.aio.live.connect(model=model, config=config) as session:
    # ...
    await session.send_realtime_input(activity_start=types.ActivityStart())
    await session.send_realtime_input(
        audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
    )
    await session.send_realtime_input(activity_end=types.ActivityEnd())
    # ...

JavaScript

const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: true,
    }
  }
};

session.sendRealtimeInput({ activityStart: {} })

session.sendRealtimeInput(
  {
    audio: {
      data: base64Audio,
      mimeType: "audio/pcm;rate=16000"
    }
  }

);

session.sendRealtimeInput({ activityEnd: {} })

トークン数

消費されたトークンの合計数は、返されたサーバーメッセージの usageMetadata フィールドで確認できます。

Python

async for message in session.receive():
    # The server will periodically send messages that include UsageMetadata.
    if message.usage_metadata:
        usage = message.usage_metadata
        print(
            f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
        )
        for detail in usage.response_tokens_details:
            match detail:
                case types.ModalityTokenCount(modality=modality, token_count=count):
                    print(f"{modality}: {count}")

JavaScript

const turns = await handleTurn();

for (const turn of turns) {
  if (turn.usageMetadata) {
    console.debug('Used %s tokens in total. Response token breakdown:\n', turn.usageMetadata.totalTokenCount);

    for (const detail of turn.usageMetadata.responseTokensDetails) {
      console.debug('%s\n', detail);
    }
  }
}

メディアの解像度

セッション構成の一部として mediaResolution フィールドを設定することで、入力メディアのメディア解像度を指定できます。

Python

from google.genai import types

config = {
    "response_modalities": ["AUDIO"],
    "media_resolution": types.MediaResolution.MEDIA_RESOLUTION_LOW,
}

JavaScript

import { GoogleGenAI, Modality, MediaResolution } from '@google/genai';

const config = {
    responseModalities: [Modality.TEXT],
    mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW,
};

制限事項

プロジェクトを計画する際は、Live API の次の制限事項を考慮してください。

レスポンスモダリティ

セッション構成では、セッションごとに 1 つのレスポンスモダリティ（TEXT または AUDIO）のみを設定できます。両方を設定すると、構成エラーメッセージが表示されます。つまり、同じセッションでテキストと音声の両方ではなく、テキストまたは音声のいずれかで応答するようにモデルを構成できます。

クライアント認証

Live API は、デフォルトでサーバー間認証のみを提供します。クライアントからサーバーへのアプローチを使用して Live API アプリケーションを実装する場合は、エフェメラルトークンを使用してセキュリティリスクを軽減する必要があります。

セッション継続時間

音声のみのセッションは 15 分に制限され、音声と動画のセッションは 2 分に制限されます。ただし、セッション継続時間を無制限に延長するために、さまざまなセッション管理手法を構成できます。

コンテキストウィンドウ

セッションのコンテキストウィンドウの上限は次のとおりです。

ネイティブ音声出力モデルの 128,000 トークン
他の Live API モデルの場合は 32,000 トークン

サポートされている言語

Live API は次の言語をサポートしています。

言語	BCP-47 コード	言語	BCP-47 コード
ドイツ語（ドイツ）	`de-DE`	英語（オーストラリア）*	`en-AU`
英語（英国）*	`en-GB`	英語（インド）	`en-IN`
英語（米国）	`en-US`	スペイン語（米国）	`es-US`
フランス語（フランス）	`fr-FR`	ヒンディー語（インド）	`hi-IN`
ポルトガル語（ブラジル）	`pt-BR`	アラビア語（一般）	`ar-XA`
スペイン語（スペイン）*	`es-ES`	フランス語（カナダ）*	`fr-CA`
インドネシア語（インドネシア）	`id-ID`	イタリア語（イタリア）	`it-IT`
日本語（日本）	`ja-JP`	トルコ語（トルコ）	`tr-TR`
ベトナム語（ベトナム）	`vi-VN`	ベンガル語（インド）	`bn-IN`
グジャラート語（インド）*	`gu-IN`	カンナダ語（インド）*	`kn-IN`
マラーティー語（インド）	`mr-IN`	マラヤーラム語（インド）*	`ml-IN`
タミル語（インド）	`ta-IN`	テルグ語（インド）	`te-IN`
オランダ語（オランダ）	`nl-NL`	韓国語（韓国）	`ko-KR`
標準中国語（中国）*	`cmn-CN`	ポーランド語（ポーランド）	`pl-PL`
ロシア語（ロシア）	`ru-RU`	タイ語（タイ）	`th-TH`

アスタリスク（*）の付いた言語はネイティブ音声では使用できません。

次のステップ

Live API を効果的に使用するための重要な情報については、ツールの使用ガイドとセッション管理ガイドをご覧ください。
Google AI Studio で Live API を試す。
Live API モデルの詳細については、モデルページの Gemini 2.5 Flash ネイティブ音声をご覧ください。
Live API クックブック、Live API Tools クックブック、Live API スタートガイドスクリプトで、他の例をお試しください。

Live API capabilities guide

始める前に

接続を確立する

Python

JavaScript

インタラクション モダリティ

音声の送受信

オーディオ形式

テキストを送信しています

Python

JavaScript

コンテンツの増分更新

Python

JavaScript

音声文字起こし

Python

JavaScript

Python

JavaScript

音声と動画をストリーミングする

音声と言語を変更する

Python

JavaScript

ネイティブ オーディオ機能

アフェクティブ ダイアログ

Python

JavaScript

コンテキストに応じた音声にのみ対応

Python

JavaScript

思考モード

Python

JavaScript

Python

JavaScript

音声アクティビティ検出（VAD）

Python

JavaScript

自動 VAD

Python

JavaScript

VAD の自動構成

Python

JavaScript

自動 VAD を無効にする

Python

JavaScript

トークン数

Python

JavaScript

メディアの解像度

Python

JavaScript

制限事項

レスポンス モダリティ

クライアント認証

セッション継続時間

コンテキスト ウィンドウ

サポートされている言語

次のステップ

インタラクションモダリティ

ネイティブオーディオ機能

アフェクティブダイアログ

レスポンスモダリティ

コンテキストウィンドウ