Gemini 1.5 Flash price drop, fine-tuning access for all developers, and more! Learn more

Explore audio capabilities with the Gemini API

Gemini can respond to prompts about audio. For example, Gemini can:

This guide demonstrates different ways to:

Supported audio formats

Gemini supports the following audio format MIME types:

Gemini imposes the following rules on audio:

Gemini represents each second of audio as 25 tokens; for example, one minute of audio is represented as 1,500 tokens.
Gemini can only infer responses to English-language speech.
Gemini can "understand" non-speech components, such as birdsong or sirens.
The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt cannot exceed 9.5 hours.
Gemini downsamples audio files to a 16 Kbps data resolution.
If the audio source contains multiple channels, Gemini combines those channels down to a single channel.