The Gemini Live API supports low-latency, real-time speech to speech translation between 70+ languages using the gemini-3.5-live-translate-preview model. By configuring the Live API with translation settings, you can stream audio in one language and receive translated audio output in another language, enabling seamless real-time voice-to-voice translation.
Live Agent vs. Live Translation
While both use the Live API, the mental model for Live Translation is different from conversational real-time agent interactions.
| Live Agent | Live Translation |
|---|---|
| The model acts as an assistant. It listens, reasons, and takes actions on your behalf. | The model acts as an interpreter. It behaves as a real-time translator pipeline. |
| Uses turn-based interactions. Relies on pauses, intent detection, and handles interruptions. | Uses continuous stream processing. Translates as the speaker talks without waiting for turns. |
| Supports tools and agents. Native support for function calling, Google Search, and instructions. | Supports translation only. Pure low-latency translation; no support for tools or instructions. |
| Fully multimodal. Supports text, audio, video, and image inputs. | Audio restricted. Input is limited to audio to ensure strict real-time latency thresholds. |
| Granular configuration. Uses generation, speech, tools, and system instructions. | Simplified configuration. Set target_language_code and toggles like echo_target_language. |
Get started
The following examples demonstrate how to initialize a client and connect to the Live API with a translation configuration.
Python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-3.5-live-translate-preview"
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
translation_config=types.TranslationConfig(
target_language_code="pl",
echo_target_language=True
)
)
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
print("Session started with translation")
# Start receiving the translated audio stream
async for response in session.receive():
if response.server_content:
if response.server_content.input_transcription:
print(f"Input transcript: {response.server_content.input_transcription.text}")
if response.server_content.output_transcription:
print(f"Output transcript: {response.server_content.output_transcription.text}")
if response.server_content.model_turn:
for part in response.server_content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# Play or process the translated audio chunk
print(f"Received audio chunk ({len(audio_data)} bytes)")
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-3.5-live-translate-preview';
const config = {
responseModalities: [Modality.AUDIO],
inputAudioTranscription: {},
outputAudioTranscription: {},
translationConfig: {
targetLanguageCode: 'pl',
echoTargetLanguage: true
}
};
async function main() {
const session = await ai.live.connect({
model: model,
config: config,
callbacks: {
onopen: () => console.debug('Opened'),
onmessage: (message) => {
const content = message.serverContent;
if (content?.inputTranscription) {
console.log('Input transcript:', content.inputTranscription.text);
}
if (content?.outputTranscription) {
console.log('Output transcript:', content.outputTranscription.text);
}
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data;
// Play or process the translated audio chunk (base64 encoded)
console.debug(`Received audio chunk (${audioData.length} bytes)`);
}
}
}
},
onerror: (e) => console.debug('Error:', e.message),
onclose: (e) => console.debug('Close:', e.reason),
},
});
console.debug("Session started with translation");
}
main();
WebSockets
const API_KEY = "YOUR_API_KEY";
const MODEL_NAME = "gemini-3.5-live-translate-preview";
const WS_URL = `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${API_KEY}`;
const websocket = new WebSocket(WS_URL);
websocket.onopen = () => {
console.log('WebSocket Connected');
const setupMessage = {
setup: {
model: `models/${MODEL_NAME}`,
generationConfig: {
responseModalities: ['AUDIO'],
inputAudioTranscription: {},
outputAudioTranscription: {},
translationConfig: {
targetLanguageCode: 'pl',
echoTargetLanguage: true
}
}
}
};
websocket.send(JSON.stringify(setupMessage));
};
websocket.onmessage = (event) => {
const response = JSON.parse(event.data);
if (response.serverContent) {
const content = response.serverContent;
if (content.inputTranscription) {
console.log('Input transcript:', content.inputTranscription.text, `(${content.inputTranscription.languageCode})`);
}
if (content.outputTranscription) {
console.log('Output transcript:', content.outputTranscription.text, `(${content.outputTranscription.languageCode})`);
}
if (content.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data;
// Play or process the translated audio chunk (base64 encoded)
console.debug(`Received audio chunk (${audioData.length} bytes)`);
}
}
}
}
};
Sending audio
To stream voice inputs for translation, you send raw, little-endian, 16-bit PCM audio.
- Input audio format: Raw 16-bit PCM at 16kHz (mono, little-endian).
- Output audio format: Raw 16-bit PCM at 24kHz (mono, little-endian).
- Chunk Size and Latency: Send audio in chunks of 100ms.
The following examples show how to send audio chunks to the session.
Python
# Assuming 'chunk' is your raw PCM audio bytes
await session.send_realtime_input(
audio=types.Blob(
data=chunk,
mime_type="audio/pcm;rate=16000"
)
)
JavaScript
// Assuming 'chunk' is a Buffer of raw PCM audio
session.sendRealtimeInput({
audio: {
data: chunk.toString('base64'),
mimeType: 'audio/pcm;rate=16000'
}
});
WebSockets
// Assuming 'chunk' is a Buffer of raw PCM audio
function sendAudioChunk(chunk) {
if (websocket.readyState === WebSocket.OPEN) {
const audioMessage = {
realtimeInput: {
audio: {
data: chunk.toString('base64'),
mimeType: 'audio/pcm;rate=16000'
}
}
};
websocket.send(JSON.stringify(audioMessage));
}
}
Configuration
To enable translation, you must specify the translationConfig within the generationConfig during the session setup.
Setup message configuration
The generationConfig supports the following fields to enable transcripts:
inputAudioTranscription: An object that, when present, enables the model to send text transcripts of the input audio.outputAudioTranscription: An object that, when present, enables the model to send text transcripts of the output (translated) audio.
The translationConfig supports the following fields:
targetLanguageCode: The BCP-47 language code of the language you want the model to translate into (e.g.,"pl"for Polish,"es"for Spanish). Defaults to"en".echoTargetLanguage: A boolean indicating how input audio that is already in the target language should be handled. If set totrue, the model will echo (parrot) input audio that is already in the target language. If set tofalse, the model will stay silent when the input speech is already in the target language. Defaults tofalse.
Here is an example of the setup message structure:
"setup": {
"model": "models/gemini-3.5-live-translate-preview",
"generationConfig": {
"responseModalities": [
"AUDIO"
],
"inputAudioTranscription": {},
"outputAudioTranscription": {},
"translationConfig": {
"targetLanguageCode": "pl",
"echoTargetLanguage": true
}
}
}
Ephemeral tokens for client-side applications
For client-to-server applications, you can use ephemeral tokens (currently in v1alpha) to avoid exposing your API key.
When using ephemeral tokens with Live Translation:
- You must use the
v1alphaendpoint. - Locking configuration: By default, you should specify the
translationConfigin the token creation constraints on your server. This ensures the translation configuration is locked and cannot be tampered with by the client. - Unlocking configuration: If you want to be able to set the
translationConfigon the client-side (for example, to let a user choose their own target language), you must omit it from the token creation request and set"lock_additional_fields": []instead. This will unlocktranslationConfigto be set on the client-side.
Creating a constrained ephemeral token
The following examples demonstrate how to create an ephemeral token with translation constraints.
Python
import datetime
from google import genai
now = datetime.datetime.now(tz=datetime.timezone.utc)
client = genai.Client(
http_options={'api_version': 'v1alpha'}
)
token = client.auth_tokens.create(
config = {
'uses': 1,
'expire_time': now + datetime.timedelta(minutes=30),
'live_connect_constraints': {
'model': 'gemini-3.5-live-translate-preview',
'config': {
'translation_config': {
'target_language_code': 'pl',
'echo_target_language': True
}
}
},
'http_options': {'api_version': 'v1alpha'},
}
)
JavaScript
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({});
const expireTime = new Date(Date.now() + 30 * 60 * 1000).toISOString();
const token = await client.authTokens.create({
config: {
uses: 1,
expireTime: expireTime,
liveConnectConstraints: {
model: 'gemini-3.5-live-translate-preview',
config: {
responseModalities: ['AUDIO'],
inputAudioTranscription: {},
outputAudioTranscription: {},
translationConfig: {
targetLanguageCode: 'pl',
echoTargetLanguage: true
}
}
},
httpOptions: {
apiVersion: 'v1alpha'
}
},
});
Limitations
- Input Modalities: Only audio input is supported for translation. Text input is not supported.
- Voice Replication: Voice replication can be inconsistent. Voices might shift after long pauses, assign the wrong gender based on how the speech starts, or get stuck on one voice during rapid multi-speaker conversations.
- Language Detection: Language detection struggles with heavy accents, similar languages (e.g., Spanish vs. Portuguese), or rapid language switches. Note: This should only impact the input transcript. Language codes and the final translation should still be accurate.
- Background Audio: The model is designed to filter out noise and music to produce clean speech, but not all background audio may be ignored.
- Echo Target Language: When
echoTargetLanguage: true, background noise or music may introduce artifacts in the translated audio when input audio is already in the target language.
Supported languages
The following languages are supported for Live Translation.
| Language | BCP-47 Code | Language | BCP-47 Code |
|---|---|---|---|
| Afrikaans | af | Kazakh | kk |
| Akan | ak | Khmer | km |
| Albanian | sq | Kinyarwanda | rw |
| Amharic | am | Korean | ko |
| Arabic | ar | Lao | lo |
| Armenian | hy | Latvian | lv |
| Azerbaijani | az | Lithuanian | lt |
| Basque | eu | Macedonian | mk |
| Belarusian | be | Malay | ms |
| Bengali | bn | Malayalam | ml |
| Bulgarian | bg | Marathi | mr |
| Burmese (Myanmar) | my | Mongolian | mn |
| Catalan | ca | Nepali | ne |
| Chinese (Simplified) | zh-Hans | Norwegian | no, nb |
| Chinese (Traditional) | zh-Hant | Persian | fa |
| Croatian | hr | Polish | pl |
| Czech | cs | Portuguese (Brazil) | pt-BR |
| Danish | da | Portuguese (Portugal) | pt-PT |
| Dutch | nl | Punjabi | pa |
| English | en | Romanian | ro |
| Estonian | et | Russian | ru |
| Filipino | fil | Serbian | sr |
| Finnish | fi | Sindhi | sd |
| French | fr | Sinhala | si |
| Galician | gl | Slovak | sk |
| Georgian | ka | Slovenian | sl |
| German | de | Spanish | es |
| Greek | el | Sundanese | su |
| Gujarati | gu | Swahili | sw |
| Hausa | ha | Swedish | sv |
| Hebrew | he | Tamil | ta |
| Hindi | hi | Telugu | te |
| Hungarian | hu | Thai | th |
| Icelandic | is | Turkish | tr |
| Indonesian | id | Ukrainian | uk |
| Italian | it | Urdu | ur |
| Japanese | ja | Uzbek | uz |
| Javanese | jv | Vietnamese | vi |
| Kannada | kn | Zulu | zu |
What's next
- Read the full Live API Capabilities guide.
- Read the Get started with the SDK guide.
- Read the Get started with WebSockets guide.
- Read the Ephemeral tokens guide for secure authentication in client-to-server applications.
- Clone the Live API examples from GitHub.