The Interactions API is now generally available. We recommend using this API for access to all the latest features and models.

Live translation with Gemini Live API

The Gemini Live API supports low-latency, real-time speech to speech translation between 70+ languages using the gemini-3.5-live-translate-preview model. By configuring the Live API with translation settings, you can stream audio in one language and receive translated audio output in another language, enabling seamless real-time voice-to-voice translation.

Try the Live Translate in Google AI Studio Clone the example app from GitHub Use coding agent skills

Live Agent vs. Live Translation

While both use the Live API, the mental model for Live Translation is different from conversational real-time agent interactions.

Live Agent	Live Translation
The model acts as an assistant. It listens, reasons, and takes actions on your behalf.	The model acts as an interpreter. It behaves as a real-time translator pipeline.
Uses turn-based interactions. Relies on pauses, intent detection, and handles interruptions.	Uses continuous stream processing. Translates as the speaker talks without waiting for turns.
Supports tools and agents. Native support for function calling, Google Search, and instructions.	Supports translation only. Pure low-latency translation; no support for tools or instructions.
Fully multimodal. Supports text, audio, video, and image inputs.	Audio restricted. Input is limited to audio to ensure strict real-time latency thresholds.
Granular configuration. Uses generation, speech, tools, and system instructions.	Simplified configuration. Set `target_language_code` and toggles like `echo_target_language`.

Get started

The following examples demonstrate how to initialize a client and connect to the Live API with a translation configuration.

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

model = "gemini-3.5-live-translate-preview"
config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    input_audio_transcription=types.AudioTranscriptionConfig(),
    output_audio_transcription=types.AudioTranscriptionConfig(),
    translation_config=types.TranslationConfig(
        target_language_code="pl",
        echo_target_language=True
    )
)

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started with translation")
        # Start receiving the translated audio stream
        async for response in session.receive():
            if response.server_content:
                if response.server_content.input_transcription:
                    print(f"Input transcript: {response.server_content.input_transcription.text}")
                if response.server_content.output_transcription:
                    print(f"Output transcript: {response.server_content.output_transcription.text}")
                if response.server_content.model_turn:
                    for part in response.server_content.model_turn.parts:
                        if part.inline_data:
                            audio_data = part.inline_data.data
                            # Play or process the translated audio chunk
                            print(f"Received audio chunk ({len(audio_data)} bytes)")

if __name__ == "__main__":
    asyncio.run(main())

JavaScript

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-3.5-live-translate-preview';
const config = {
    responseModalities: [Modality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
    translationConfig: {
        targetLanguageCode: 'pl',
        echoTargetLanguage: true
    }
};

async function main() {
  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: {
      onopen: () => console.debug('Opened'),
      onmessage: (message) => {
        const content = message.serverContent;
        if (content?.inputTranscription) {
          console.log('Input transcript:', content.inputTranscription.text);
        }
        if (content?.outputTranscription) {
          console.log('Output transcript:', content.outputTranscription.text);
        }
        if (content?.modelTurn?.parts) {
          for (const part of content.modelTurn.parts) {
            if (part.inlineData) {
              const audioData = part.inlineData.data;
              // Play or process the translated audio chunk (base64 encoded)
              console.debug(`Received audio chunk (${audioData.length} bytes)`);
            }
          }
        }
      },
      onerror: (e) => console.debug('Error:', e.message),
      onclose: (e) => console.debug('Close:', e.reason),
    },
  });

  console.debug("Session started with translation");
}

main();

WebSockets

const API_KEY = "YOUR_API_KEY";
const MODEL_NAME = "gemini-3.5-live-translate-preview";
const WS_URL = `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${API_KEY}`;

const websocket = new WebSocket(WS_URL);

websocket.onopen = () => {
  console.log('WebSocket Connected');

  const setupMessage = {
    setup: {
      model: `models/${MODEL_NAME}`,
      generationConfig: {
        responseModalities: ['AUDIO'],
        inputAudioTranscription: {},
        outputAudioTranscription: {},
        translationConfig: {
          targetLanguageCode: 'pl',
          echoTargetLanguage: true
        }
      }
    }
  };
  websocket.send(JSON.stringify(setupMessage));
};

websocket.onmessage = (event) => {
  const response = JSON.parse(event.data);
  if (response.serverContent) {
    const content = response.serverContent;
    if (content.inputTranscription) {
      console.log('Input transcript:', content.inputTranscription.text, `(${content.inputTranscription.languageCode})`);
    }
    if (content.outputTranscription) {
      console.log('Output transcript:', content.outputTranscription.text, `(${content.outputTranscription.languageCode})`);
    }
    if (content.modelTurn?.parts) {
      for (const part of content.modelTurn.parts) {
        if (part.inlineData) {
          const audioData = part.inlineData.data;
          // Play or process the translated audio chunk (base64 encoded)
          console.debug(`Received audio chunk (${audioData.length} bytes)`);
        }
      }
    }
  }
};

Sending audio

To stream voice inputs for translation, you send raw, little-endian, 16-bit PCM audio.

Input audio format: Raw 16-bit PCM at 16kHz (mono, little-endian).
Output audio format: Raw 16-bit PCM at 24kHz (mono, little-endian).
Chunk Size and Latency: Send audio in chunks of 100ms.

The following examples show how to send audio chunks to the session.

Python

# Assuming 'chunk' is your raw PCM audio bytes
await session.send_realtime_input(
    audio=types.Blob(
        data=chunk,
        mime_type="audio/pcm;rate=16000"
    )
)

JavaScript

// Assuming 'chunk' is a Buffer of raw PCM audio
session.sendRealtimeInput({
  audio: {
    data: chunk.toString('base64'),
    mimeType: 'audio/pcm;rate=16000'
  }
});

WebSockets

// Assuming 'chunk' is a Buffer of raw PCM audio
function sendAudioChunk(chunk) {
  if (websocket.readyState === WebSocket.OPEN) {
    const audioMessage = {
      realtimeInput: {
        audio: {
          data: chunk.toString('base64'),
          mimeType: 'audio/pcm;rate=16000'
        }
      }
    };
    websocket.send(JSON.stringify(audioMessage));
  }
}

Configuration

To enable translation, you must specify the translationConfig within the generationConfig during the session setup.

Setup message configuration

The generationConfig supports the following fields to enable transcripts:

inputAudioTranscription: An object that, when present, enables the model to send text transcripts of the input audio.
outputAudioTranscription: An object that, when present, enables the model to send text transcripts of the output (translated) audio.

The translationConfig supports the following fields:

targetLanguageCode: The BCP-47 language code of the language you want the model to translate into (e.g., "pl" for Polish, "es" for Spanish). Defaults to "en".
echoTargetLanguage: A boolean indicating how input audio that is already in the target language should be handled. If set to true, the model will echo (parrot) input audio that is already in the target language. If set to false, the model will stay silent when the input speech is already in the target language. Defaults to false.

Here is an example of the setup message structure:

"setup": {
    "model": "models/gemini-3.5-live-translate-preview",
    "generationConfig": {
      "responseModalities": [
        "AUDIO"
      ],
      "inputAudioTranscription": {},
      "outputAudioTranscription": {},
      "translationConfig": {
        "targetLanguageCode": "pl",
        "echoTargetLanguage": true
      }
    }
}

Use ephemeral tokens in client-side applications

For client-to-server applications, you can use ephemeral tokens (currently in v1beta) to avoid exposing your API key.

When using ephemeral tokens with Live Translation:

You must use the v1beta endpoint.
Locking configuration: By default, you should specify the translationConfig in the token creation constraints on your server. This ensures the translation configuration is locked and cannot be tampered with by the client.
Unlocking configuration: If you want to be able to set the translationConfig on the client-side (for example, to let a user choose their own target language), you must omit it from the token creation request and set "lock_additional_fields": [] instead. This will unlock translationConfig to be set on the client-side.

Create a constrained ephemeral token

The following examples demonstrate how to create an ephemeral token with translation constraints.

Python

import datetime
from google import genai

now = datetime.datetime.now(tz=datetime.timezone.utc)

client = genai.Client()

token = client.auth_tokens.create(
    config = {
        'uses': 1,
        'expire_time': now + datetime.timedelta(minutes=30),
        'live_connect_constraints': {
            'model': 'gemini-3.5-live-translate-preview',
            'config': {
                'translation_config': {
                    'target_language_code': 'pl',
                    'echo_target_language': True
                }
            }
        },
    }
)

JavaScript

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({});
const expireTime = new Date(Date.now() + 30 * 60 * 1000).toISOString();

const token = await client.authTokens.create({
    config: {
        uses: 1,
        expireTime: expireTime,
        liveConnectConstraints: {
            model: 'gemini-3.5-live-translate-preview',
            config: {
                responseModalities: ['AUDIO'],
                inputAudioTranscription: {},
                outputAudioTranscription: {},
                translationConfig: {
                    targetLanguageCode: 'pl',
                    echoTargetLanguage: true
                }
            }
        },
    },
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/auth_tokens" \
  -H "x-goog-api-key: ${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "uses": 1,
    "expireTime": "YYYY-MM-DDTHH:MM:SSZ",
    "liveConnectConstraints": {
      "model": "models/gemini-3.5-live-translate-preview",
      "config": {
        "responseModalities": ["AUDIO"],
        "inputAudioTranscription": {},
        "outputAudioTranscription": {},
        "translationConfig": {
          "targetLanguageCode": "pl",
          "echoTargetLanguage": true
        }
      }
    }
  }'

Limitations

Input Modalities: Only audio input is supported for translation. Text input is not supported.
Voice Replication: Voice replication can be inconsistent. Voices might shift after long pauses, assign the wrong gender based on how the speech starts, or get stuck on one voice during rapid multi-speaker conversations.
Language Detection: Language detection struggles with heavy accents, similar languages (e.g., Spanish vs. Portuguese), or rapid language switches. Note: This should only impact the input transcript. Language codes and the final translation should still be accurate.
Background Audio: The model is designed to filter out noise and music to produce clean speech, but not all background audio may be ignored.
Echo Target Language: When echoTargetLanguage: true, background noise or music may introduce artifacts in the translated audio when input audio is already in the target language.

Supported languages

The following languages are supported for Live Translation.

Language	BCP-47 Code	Language	BCP-47 Code
Afrikaans	af	Kazakh	kk
Akan	ak	Khmer	km
Albanian	sq	Kinyarwanda	rw
Amharic	am	Korean	ko
Arabic	ar	Lao	lo
Armenian	hy	Latvian	lv
Azerbaijani	az	Lithuanian	lt
Basque	eu	Macedonian	mk
Belarusian	be	Malay	ms
Bengali	bn	Malayalam	ml
Bulgarian	bg	Marathi	mr
Burmese (Myanmar)	my	Mongolian	mn
Catalan	ca	Nepali	ne
Chinese (Simplified)	zh-Hans	Norwegian	no, nb
Chinese (Traditional)	zh-Hant	Persian	fa
Croatian	hr	Polish	pl
Czech	cs	Portuguese (Brazil)	pt-BR
Danish	da	Portuguese (Portugal)	pt-PT
Dutch	nl	Punjabi	pa
English	en	Romanian	ro
Estonian	et	Russian	ru
Filipino	fil	Serbian	sr
Finnish	fi	Sindhi	sd
French	fr	Sinhala	si
Galician	gl	Slovak	sk
Georgian	ka	Slovenian	sl
German	de	Spanish	es
Greek	el	Sundanese	su
Gujarati	gu	Swahili	sw
Hausa	ha	Swedish	sv
Hebrew	he	Tamil	ta
Hindi	hi	Telugu	te
Hungarian	hu	Thai	th
Icelandic	is	Turkish	tr
Indonesian	id	Ukrainian	uk
Italian	it	Urdu	ur
Japanese	ja	Uzbek	uz
Javanese	jv	Vietnamese	vi
Kannada	kn	Zulu	zu

What's next

Read the full Live API Capabilities guide.
Read the Get started with the SDK guide.
Read the Get started with WebSockets guide.
Read the Ephemeral tokens guide for secure authentication in client-to-server applications.
Clone the Live API examples from GitHub.