Live API

The Live API enables low-latency bidirectional voice and video interactions with Gemini. Using the Live API, you can provide end users with the experience of natural, human-like voice conversations, and with the ability to interrupt the model's responses using voice commands. The model can process text, audio, and video input, and it can provide text and audio output.

You can try the Live API in Google AI Studio.

What's new

The Live API has new features and capabilities!

New capabilities:

  • Two new voices and 30 new languages, with configurable output language
  • Configurable image resolutions 66/256 tokens
  • Configurable turn coverage: Send all inputs all the time or only when the user is speaking
  • Configure if input should interrupt the model or not
  • Configurable Voice Activity Detection and new client events for end of turn signaling
  • Token counts
  • A client event for signaling end of stream
  • Text streaming
  • Configurable session resumption, with session data stored on the server for 24 hours
  • Longer session support with a sliding context window

New client events:

  • End of audio stream / mic closed
  • Activity start/end events for manually controlling turn transition

New server events:

  • Go away notification signaling a need to restart a session
  • Generation complete

Use the Live API

This section describes how to use the Live API with one of our SDKs. For more information about the underlying WebSockets API, see the WebSockets API reference.

Send and receive text

import asyncio
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        while True:
            message = input("User> ")
            if message.lower() == "exit":
                break
            await session.send_client_content(
                turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
            )

            async for response in session.receive():
                if response.text is not None:
                    print(response.text, end="")

if __name__ == "__main__":
    asyncio.run(main())

Receive audio

The following example shows how to receive audio data and write it to a .wav file.

import asyncio
import wave
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY", http_options={'api_version': 'v1alpha'})
model = "gemini-2.0-flash-live-001"

config = {"response_modalities": ["AUDIO"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)

        message = "Hello? Gemini are you there?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for idx,response in async_enumerate(session.receive()):
            if response.data is not None:
                wf.writeframes(response.data)

            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)

        wf.close()

if __name__ == "__main__":
    asyncio.run(main())

Audio formats

The Live API supports the following audio formats:

  • Input audio format: Raw 16 bit PCM audio at 16kHz little-endian
  • Output audio format: Raw 16 bit PCM audio at 24kHz little-endian

Stream audio and video

System instructions

System instructions let you steer the behavior of a model based on your specific needs and use cases. System instructions can be set in the setup configuration and will remain in effect for the entire session.

from google.genai import types

config = {
    "system_instruction": types.Content(
        parts=[
            types.Part(
                text="You are a helpful assistant and answer in a friendly tone."
            )
        ]
    ),
    "response_modalities": ["TEXT"],
}

Incremental content updates

Use incremental updates to send text input, establish session context, or restore session context. For short contexts you can send turn-by-turn interactions to represent the exact sequence of events:

turns = [
    {"role": "user", "parts": [{"text": "What is the capital of France?"}]},
    {"role": "model", "parts": [{"text": "Paris"}]},
]

await session.send_client_content(turns=turns, turn_complete=False)

turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]

await session.send_client_content(turns=turns, turn_complete=True)
{
  "clientContent": {
    "turns": [
      {
        "parts":[
          {
            "text": ""
          }
        ],
        "role":"user"
      },
      {
        "parts":[
          {
            "text": ""
          }
        ],
        "role":"model"
      }
    ],
    "turnComplete": true
  }
}

For longer contexts it's recommended to provide a single message summary to free up the context window for subsequent interactions.

Change voices

The Live API supports the following voices: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.

To specify a voice, set the voice name within the speechConfig object as part of the session configuration:

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    speech_config=types.SpeechConfig(
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
        )
    )
)
{
  "voiceConfig": {
    "prebuiltVoiceConfig": {
      "voiceName": "Kore"
    }
  }
}

Use function calling

You can define tools with the Live API. See the Function calling tutorial to learn more about function calling.

Tools must be defined as part of the session configuration:

config = types.LiveConnectConfig(
    response_modalities=["TEXT"],
    tools=[set_light_values]
)

async with client.aio.live.connect(model=model, config=config) as session:
    await session.send_client_content(
        turns={
            "role": "user",
            "parts": [{"text": "Turn the lights down to a romantic level"}],
        },
        turn_complete=True,
    )

    async for response in session.receive():
        print(response.tool_call)

From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. The execution pauses until the results of each function call are available, which ensures sequential processing.

The client should respond with BidiGenerateContentToolResponse.

Audio inputs and audio outputs negatively impact the model's ability to use function calling.

Handle interruptions

Users can interrupt the model's output at any time. When Voice activity detection (VAD) detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a BidiGenerateContentServerContent message to report the interruption.

In addition, the Gemini server discards any pending function calls and sends a BidiGenerateContentServerContent message with the IDs of the canceled calls.

async for response in session.receive():
    if response.server_content.interrupted is not None:
        # The generation was interrupted

Configure voice activity detection (VAD)

By default, the model automatically performs voice activity detection (VAD) on a continuous audio input stream. VAD can be configured with the realtimeInputConfig.automaticActivityDetection field of the setup configuration.

When the audio stream is paused for more than a second (for example, because the user switched off the microphone), an audioStreamEnd event should be sent to flush any cached audio. The client can resume sending audio data at any time.

Alternatively, the automatic VAD can be disabled by setting realtimeInputConfig.automaticActivityDetection.disabled to true in the setup message. In this configuration the client is responsible for detecting user speech and sending activityStart and activityEnd messages at the appropriate times. An audioStreamEnd isn't sent in this configuration. Instead, any interruption of the stream is marked by an activityEnd message.

SDK support for this feature will be available in the coming weeks.

Get the token count

You can find the total number of consumed tokens in the usageMetadata field of the returned server message.

from google.genai import types

async with client.aio.live.connect(
    model='gemini-2.0-flash-live-001',
    config=types.LiveConnectConfig(
        response_modalities=['AUDIO'],
    ),
) as session:
    # Session connected
    while True:
        await session.send_client_content(
            turns=types.Content(role='user', parts=[types.Part(text='Hello world!')])
        )
        async for message in session.receive():
            # The server will periodically send messages that include
            # UsageMetadata.
            if message.usage_metadata:
                usage = message.usage_metadata
                print(
                    f'Used {usage.total_token_count} tokens in total. Response token'
                    ' breakdown:'
                )
            for detail in usage.response_tokens_details:
                match detail:
                  case types.ModalityTokenCount(modality=modality, token_count=count):
                      print(f'{modality}: {count}')

            # For the purposes of this example, placeholder input is continually fed
            # to the model. In non-sample code, the model inputs would come from
            # the user.
            if message.server_content and message.server_content.turn_complete:
                break

Configure session resumption

To prevent session termination when the server periodically resets the WebSocket connection, configure the sessionResumption field within the setup configuration.

Passing this configuration causes the server to send SessionResumptionUpdate messages, which can be used to resume the session by passing the last resumption token as the SessionResumptionConfig.handle of the subsequent connection.

from google.genai import types

print(f"Connecting to the service with handle {previous_session_handle}...")
async with client.aio.live.connect(
    model="gemini-2.0-flash-live-001",
    config=types.LiveConnectConfig(
        response_modalities=["AUDIO"],
        session_resumption=types.SessionResumptionConfig(
            # The handle of the session to resume is passed here,
            # or else None to start a new session.
            handle=previous_session_handle
        ),
    ),
) as session:
    # Session connected
    while True:
        await session.send_client_content(
            turns=types.Content(
                role="user", parts=[types.Part(text="Hello world!")]
            )
        )
        async for message in session.receive():
            # Periodically, the server will send update messages that may
            # contain a handle for the current state of the session.
            if message.session_resumption_update:
                update = message.session_resumption_update
                if update.resumable and update.new_handle:
                    # The handle should be retained and linked to the session.
                    return update.new_handle

            # For the purposes of this example, placeholder input is continually fed
            # to the model. In non-sample code, the model inputs would come from
            # the user.
            if message.server_content and message.server_content.turn_complete:
                break

Receive a message before the session disconnects

The server sends a GoAway message that signals that the current connection will soon be terminated. This message includes the timeLeft, indicating the remaining time and lets you take further action before the connection will be terminated as ABORTED.

Receive a message when the generation is complete

The server sends a generationComplete message that signals that the model finished generating the response.

Enable context window compression

To enable longer sessions, and avoid abrupt connection termination, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration.

In the ContextWindowCompressionConfig, you can configure a sliding-window mechanism and the number of tokens that triggers compression.

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    context_window_compression=(
        # Configures compression with default parameters.
        types.ContextWindowCompressionConfig(
            sliding_window=types.SlidingWindow(),
        )
    ),
)

Change the media resolution

You can specify the media resolution for the input media by setting the mediaResolution field as part of the session configuration:

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    media_resolution=types.MediaResolution.MEDIA_RESOLUTION_LOW,
)

Limitations

Consider the following limitations of the Live API and Gemini 2.0 when you plan your project.

Client authentication

The Live API only provides server to server authentication and isn't recommended for direct client use. Client input should be routed through an intermediate application server for secure authentication with the Live API.

Session duration

Session duration can be extended to unlimited by enabling session compression. Without compression, audio-only sessions are limited to 15 minutes, and audio plus video sessions are limited to 2 minutes. Exceeding these limits without compression will terminate the connection.

Context window

A session has a context window limit of 32k tokens.

Third-party integrations

For web and mobile app deployments, you can explore options from: