View on ai.google.dev | Run in Google Colab | View source on GitHub |
The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting, meaning you can include those types of media files in your prompts. For small files, you can point the Gemini model directly to a local file when providing a prompt. Upload larger files with the File API before including them in prompts.
The File API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours and can be accessed with your API key for generation within that time period and cannot be downloaded from the API. The Files API is available at no cost in all regions where the Gemini API is available.
The File API handles inputs that can be used to generate content with
model.generateContent
or model.streamGenerateContent
. For information on
valid file formats (MIME types) and supported models, see Supported file
formats.
This guide shows how to use the File API to upload media files and include them
in a GenerateContent
call to the Gemini API. For more information, see the
code
samples.
Before you begin: Set up your project and API key
Before calling the Gemini API (or its File API), you need to set up your project and configure your API key.
Install the Python SDK and import packages
The Python SDK for the Gemini API is contained in the
google-generativeai
package.
Install the dependency using pip:
pip install -q -U google-generativeai
Import the necessary packages:
import google.generativeai as genai from IPython.display import Markdown
Secure and configure your API key
You need an API key to call the Gemini API (and its File API). If you don't already have one, create a key in Google AI Studio.
Store your API key in a Colab Secret named GOOGLE_API_KEY
. If you're
unfamiliar with Colab Secrets, refer to the
Authentication quickstart.
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
Prompting with images
In this tutorial, you upload a sample image using the File API and then use it to generate content.
Upload an image file
Refer to the Appendix section to learn how to upload your own file.
Prepare a sample image to upload:
curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg
Upload that file using
media.upload
so that you can access it with other API calls:sample_file = genai.upload_file(path="image.jpg", display_name="Sample drawing") print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")
The response
shows that the uploaded image is stored with the specified
display_name
and has a uri
to reference the file in Gemini API calls. Use
the response
to track how uploaded files are mapped to URIs.
Depending on your use case, you can store the URIs in structures, such as a
dict
or a database.
Get the image file's metadata
After uploading the file, you can verify the API successfully stored the file
and get its metadata by calling files.get
through the SDK.
This method lets you get the metadata for an uploaded file associated with the
Google Cloud project linked to your API key. Only the name
(and by extension,
the uri
) are unique. Use the display_name
to identify files only if you
manage uniqueness yourself.
file = genai.get_file(name=sample_file.name)
print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")
Generate content using the uploaded image file
After uploading the image, you can make GenerateContent
requests that
reference the uri
in the response (from either uploading the file or directly
getting the metadata of the file).
In this example, you create a prompt that starts with text followed by the URI reference for the uploaded file:
# The Gemini 1.5 models are versatile and work with multimodal prompts
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
response = model.generate_content([sample_file, "Describe the image with a creative description."])
Markdown(">" + response.text)
Delete the image file
Files are automatically deleted after 48 hours. You can also manually delete
them using files.delete
through the SDK.
genai.delete_file(sample_file.name)
print(f'Deleted {sample_file.display_name}.')
Prompting with videos
In this tutorial, you upload a sample video using the File API and then use it to generate content.
Upload a video file
The Gemini API accepts video file formats directly. This example uses the short film "Big Buck Bunny".
"Big Buck Bunny" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and licensed under the Creative Commons Attribution 3.0 License.
Refer to the Appendix section to learn how to upload your own file.
Prepare the sample video file for upload:
wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
Upload that file using
media.upload
so that you can access it with other API calls:video_file_name = "BigBuckBunny_320x180.mp4" print(f"Uploading file...") video_file = genai.upload_file(path=video_file_name) print(f"Completed upload: {video_file.uri}")
Verify the video file's upload state
Verify that the API has successfully uploaded the video file by calling the
files.get
method through the SDK.
Video files have a State
field from the File API. When a video is uploaded, it
will be in a PROCESSING
state until it is ready for inference. Only ACTIVE
files can be used for model inference.
import time
while video_file.state.name == "PROCESSING":
print('.', end='')
time.sleep(10)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
Get the video file's metadata
You can get the uploaded video file's metadata at any time by calling the
files.get
method through the SDK.
This method lets you get the metadata for an uploaded file associated with the
Google Cloud project linked to your API key. Only the name
(and by extension,
the uri
) are unique. Use the display_name
to identify files only if you
manage uniqueness yourself.
file = genai.get_file(name=video_file.name)
print(f"Retrieved file '{file.display_name}' as: {video_file.uri}")
Generate content using the uploaded video file
After uploading the video, you can make GenerateContent
requests that
reference the uri
in the response (from either uploading the file or directly
getting the metadata of the file).
Make sure that you've verified the video file's upload state (section above) before running inference on the video.
# Create the prompt.
prompt = "Describe this video."
# The Gemini 1.5 models are versatile and work with multimodal prompts
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")
# Make the LLM request.
print("Making LLM inference request...")
response = model.generate_content([video_file, prompt],
request_options={"timeout": 600})
print(response.text)
Delete the video file
Files are automatically deleted after 48 hours. You can also manually delete
them using files.delete
through the SDK.
genai.delete_file(file_response.name)
print(f'Deleted file {file_response.uri}')
Supported file formats
Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, video, and plain text files. You can use media files for prompting only with specific model versions, as shown in the following table.
Model | Images | Audio | Video | Plain text |
---|---|---|---|---|
Gemini 1.5 Pro (release 008 and later) | ✔ (3600 max image files) | ✔ | ✔ | ✔ |
Image formats
You can use image data for prompting with Gemini 1.5 models. When you use images for prompting, they are subject to the following limitations and requirements:
- Images must be in one of the following image data MIME
types:
- PNG - image/png
- JPEG - image/jpeg
- WEBP - image/webp
- HEIC - image/heic
- HEIF - image/heif
- Maximum of 3600 images for the Gemini 1.5 models.
- No specific limits to the number of pixels in an image; however, larger images are scaled down to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.
Audio formats
You can use audio data for prompting with the Gemini 1.5 models. When you use audio for prompting, they are subject to the following limitations and requirements:
- Audio data is supported in the following common audio format MIME
types:
- WAV - audio/wav
- MP3 - audio/mp3
- AIFF - audio/aiff
- AAC - audio/aac
- OGG Vorbis - audio/ogg
- FLAC - audio/flac
- The maximum supported length of audio data in a single prompt is 9.5 hours.
- Audio files are resampled down to a 16 Kbps data resolution, and multiple channels of audio are combined into a single channel.
- There is no specific limit to the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt cannot exceed 9.5 hours.
Video formats
You can use video data for prompting with the Gemini 1.5 models.
Video data is supported in the following common video format MIME types:
- video/mp4
- video/mpeg
- video/mov
- video/avi
- video/x-flv
- video/mpg
- video/webm
- video/wmv
- video/3gpp
The File API service samples videos into images at 1 frame per second (FPS) and may be subject to change to provide the best inference quality. Individual images take up 258 tokens regardless of resolution and quality.
Plain text formats
The File API supports uploading plain text files with the following MIME types:
- text/plain
- text/html
- text/css
- text/javascript
- application/x-javascript
- text/x-typescript
- application/x-typescript
- text/csv
- text/markdown
- text/x-python
- application/x-python-code
- application/json
- text/xml
- application/rtf
- text/rtf
For plain text files with a MIME type not on the list, you can try specifying one of the above MIME types manually.
Appendix: Uploading files to Colab
This notebook uses the File API with files that were downloaded from the internet. If you're running this in Colab and want to use your own files, you first need to upload them to the Colab instance.
First, click Files on the left sidebar, then click the Upload button:
Next, you'll upload that file to the File API. In the form for the code cell below, enter the filename for the file you uploaded and provide an appropriate display name for the file, then run the cell.
my_filename = "gemini_logo.png" # @param {type:"string"}
my_file_display_name = "Gemini Logo" # @param {type:"string"}
my_file = genai.upload_file(path=my_filename,
display_name=my_file_display_name)
print(f"Uploaded file '{my_file.display_name}' as: {my_file.uri}")