The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting, meaning you can include those types of media files in your prompts. For small files, you can point the Gemini model directly to a local file when providing a prompt. Upload larger files with the File API before including them in prompts.
The File API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours and can be accessed with your API key for generation within that time period and cannot be downloaded from the API. The Files API is available at no cost in all regions where the Gemini API is available.
The File API handles inputs that can be used to generate content with
model.generateContent
or model.streamGenerateContent
. For information on
valid file formats (MIME types) and supported models, see Supported file
formats.
This guide shows how to use the File API to upload media files and include them
in a GenerateContent
call to the Gemini API. For more information, see the
code
samples.
Before you begin: Set up your project and API key
Before calling the Gemini API (or its File API), you need to set up your project and configure your API key.
Get and secure your API key
You need an API key to call the Gemini API (and its File API). If you don't already have one, create a key in Google AI Studio.
It's strongly recommended that you do not check an API key into your version control system. Instead, you should use a secrets store for your API key.
This tutorial assumes that you're accessing your API key as an environment variable.
Import the SDK package and configure your API key
In your application, do the following:
In your module directory,
get
the Go SDK package:go get github.com/google/generative-ai-go
Import the package and configure the service with your API key:
import ( // import standard libraries // Import the GenerativeAI package for Go "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) ctx := context.Background() // Access your API key as an environment variable client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY"))) if err != nil { log.Fatal(err) } defer client.Close() // ...
Prompting with images
In this tutorial, you upload a sample image using the File API and then use it to generate content.
Upload an image file
Prepare a sample image to upload:
curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg
Upload that file using
media.upload
so that you can access it with other API calls:import ( // import standard libraries // Import the GenerativeAI package for Go "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) func main() { ctx := context.Background() // Access your API key as an environment variable client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY"))) if err != nil { log.Fatal(err) } defer client.Close() // Use client.UploadFile to upload a file to the service. // Pass it an io.Reader. f, err := os.Open("image.jpg") if err != nil { log.Fatal(err) } defer f.Close() // Optionally set a display name. opts := genai.UploadFileOptions{DisplayName: "Sample drawing"} // Let the API generate a unique `name` for the file by passing an empty string. // If you specify a `name`, then it has to be globally unique. file, err := client.UploadFile(ctx, "", f, &opts) if err != nil { log.Fatal(err) } // View the response. fmt.Printf("Uploaded file %s as: %q", file.DisplayName, file.URI) }
The response
shows that the uploaded image is stored with the specified
DisplayName
and has a uri
to reference the file in Gemini API calls. Use
the response
to track how uploaded files are mapped to URIs.
Depending on your use case, you can store the URIs in structures, such as a hash table or a database.
Get the image file's metadata
After uploading the file, you can verify the API successfully stored the file
and get its metadata by calling files.get
through the SDK.
This method lets you get the metadata for an uploaded file associated with the
Google Cloud project linked to your API key. Only the name
(and by extension,
the uri
) are unique. Use the DisplayName
to identify files only if you
manage uniqueness yourself.
import (
// import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
...
// Get a file's metadata.
r, err := client.GetFile(ctx, file.Name)
if err != nil {
log.Fatal(err)
}
// View the response.
fmt.Printf("Retrieved remote file %s as: %q", r.DisplayName, r.URI)
}
Generate content using the uploaded image file
After uploading the file, you can make GenerateContent
requests that reference
the uri
in the response (from either uploading the file or directly getting
the metadata of the file).
In this example, you create a prompt that starts with text followed by the URI reference for the uploaded file:
import (
// import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
...
// The Gemini 1.5 models are versatile and work with multimodal prompts
model := client.GenerativeModel("gemini-1.5-flash")
// Create a prompt using text and the URI reference for the uploaded file.
prompt := []genai.Part{
genai.FileData{URI: file.URI},
genai.Text("Describe the image with a creative description."),
}
// Generate content using the prompt.
resp, err := model.GenerateContent(ctx, prompt...)
if err != nil {
log.Fatal(err)
}
// Handle the response of generated text
for _, c := range resp.Candidates {
if c.Content != nil {
fmt.Println(*c.Content)
}
}
}
Delete the image file
Files are automatically deleted after 48 hours. You can also manually delete
them using files.delete
through the SDK.
import (
// import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Delete the file
if err := client.DeleteFile(ctx, file.Name); err != nil {
log.Fatal(err)
}
fmt.Printf("Deleted %s, file.DisplayName)
}
Prompting with videos
In this tutorial, you upload a sample video using the File API and then use it to generate content.
Upload a video file
The Gemini API accepts video file formats directly. This example uses the short film "Big Buck Bunny".
"Big Buck Bunny" is (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org and licensed under the Creative Commons Attribution 3.0 License.
Prepare the sample video file for upload:
wget https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
Upload that file using
media.upload
so that you can access it with other API calls:import ( // Import standard libraries // Import the GenerativeAI package for Go "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) func main() { ctx := context.Background() // Access your API key as an environment variable client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY"))) if err != nil { log.Fatal(err) } defer client.Close() // Use client.UploadFile to upload a file to the service. // Pass it an io.Reader. f, err := os.Open("BigBuckBunny_320x180.mp4") if err != nil { log.Fatal(err) } defer f.Close() // Optionally set a display name. opts := genai.UploadFileOptions{DisplayName: "Sample video"} // Let the API generate a unique `name` for the file by passing an empty string. // If you specify a `name`, then it has to be globally unique. response, err := client.UploadFile(ctx, "", f, &opts) if err != nil { log.Fatal(err) } // View the response. fmt.Printf("Uploaded file %s as: %q", response.DisplayName, response.URI) }
The response
shows that the uploaded video is stored with the specified
DisplayName
and has a uri
to reference the file in Gemini API calls. Use
the response
to track how uploaded files are mapped to URIs.
Depending on your use case, you can store the URIs in structures, such as a hash table or a database.
Verify the video file's upload state
Verify that the API has successfully uploaded the video file by calling the
files.get
method through the SDK.
Video files have a State
field from the File API. When a video is uploaded, it
will be in a PROCESSING
state until it is ready for inference. Only ACTIVE
files can be used for model inference.
import (
// Import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Upload the video file using the File API
// ...
// Get a file's metadata.
r, err := client.GetFile(ctx, file.Name)
if err != nil {
log.Fatal(err)
}
// Poll GetFile() on a set interval (10 seconds here) to check file state.
for r.State == genai.FileStateProcessing {
fmt.Print(".")
// Sleep for 10 seconds
time.Sleep(10 * time.Second)
// Fetch the file from the API again.
r, err = client.GetFile(ctx, file.Name)
if err != nil {
log.Fatal(err)
}
}
// View the response.
fmt.Printf("File %s is ready for inference as: %q", r.DisplayName, r.URI)
}
Get the video file's metadata
You can get the uploaded video file's metadata at any time by calling the
files.get
method through the SDK.
This method lets you get the metadata for an uploaded file associated with the
Google Cloud project linked to your API key. Only the name
(and by extension,
the uri
) are unique. Use the DisplayName
to identify files only if you
manage uniqueness yourself.
import (
// import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
...
// Get a file's metadata.
r, err := client.GetFile(ctx, file.Name)
if err != nil {
log.Fatal(err)
}
// View the response.
fmt.Printf("Retrieved remote file %s as: %q", r.DisplayName, r.URI)
}
Generate content using the uploaded video file
After uploading the file, you can make GenerateContent
requests that reference
the uri
in the response (from either uploading the file or directly getting
the metadata of the file).
Make sure that you've verified the video file's upload state (section above) before running inference on the video.
In this example, you create a prompt that starts with text followed by the URI reference for the uploaded file:
import (
// Import standard libraries.
// Import the GenerativeAI package for Go.
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable.
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Get a file's metadata.
r, err := client.GetFile(ctx, file.Name)
if err != nil {
log.Fatal(err)
}
// Initialize the generative model with a model that supports multimodal input.
model := client.GenerativeModel("gemini-1.5-flash")
// Create a prompt using text and the URI reference for the uploaded file.
prompt := []genai.Part{
genai.FileData{URI: r.URI},
genai.Text("Describe the video with a creative description."),
}
// Generate content using the prompt.
resp, err := model.GenerateContent(ctx, prompt...)
if err != nil {
log.Fatal(err)
}
// Handle the response of generated text.
for _, c := range resp.Candidates {
if c.Content != nil {
fmt.Println(*c.Content)
}
}
}
Delete the video file
Files are automatically deleted after 48 hours. You can also manually delete
them using files.delete
through the SDK.
import (
// import standard libraries
// Import the GenerativeAI package for Go
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
// Access your API key as an environment variable
client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
if err != nil {
log.Fatal(err)
}
defer client.Close()
...
// Delete the file
if err := client.DeleteFile(ctx, file.Name); err != nil {
log.Fatal(err)
}
fmt.Printf("Deleted %s, file.DisplayName)
}
Supported file formats
Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, video, and plain text files. You can use media files for prompting only with specific model versions, as shown in the following table.
Model | Images | Audio | Video | Plain text |
---|---|---|---|---|
Gemini 1.5 Pro (release 008 and later) | ✔ (3600 max image files) | ✔ | ✔ | ✔ |
Image formats
You can use image data for prompting with Gemini 1.5 models. When you use images for prompting, they are subject to the following limitations and requirements:
- Images must be in one of the following image data MIME
types:
- PNG - image/png
- JPEG - image/jpeg
- WEBP - image/webp
- HEIC - image/heic
- HEIF - image/heif
- Maximum of 3600 images for the Gemini 1.5 models.
- No specific limits to the number of pixels in an image; however, larger images are scaled down to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.
Audio formats
You can use audio data for prompting with the Gemini 1.5 models. When you use audio for prompting, they are subject to the following limitations and requirements:
- Audio data is supported in the following common audio format MIME
types:
- WAV - audio/wav
- MP3 - audio/mp3
- AIFF - audio/aiff
- AAC - audio/aac
- OGG Vorbis - audio/ogg
- FLAC - audio/flac
- The maximum supported length of audio data in a single prompt is 9.5 hours.
- Audio files are resampled down to a 16 Kbps data resolution, and multiple channels of audio are combined into a single channel.
- There is no specific limit to the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt cannot exceed 9.5 hours.
Video formats
You can use video data for prompting with the Gemini 1.5 models.
Video data is supported in the following common video format MIME types:
- video/mp4
- video/mpeg
- video/mov
- video/avi
- video/x-flv
- video/mpg
- video/webm
- video/wmv
- video/3gpp
The File API service samples videos into images at 1 frame per second (FPS) and may be subject to change to provide the best inference quality. Individual images take up 258 tokens regardless of resolution and quality.
Plain text formats
The File API supports uploading plain text files with the following MIME types:
- text/plain
- text/html
- text/css
- text/javascript
- application/x-javascript
- text/x-typescript
- application/x-typescript
- text/csv
- text/markdown
- text/x-python
- application/x-python-code
- application/json
- text/xml
- application/rtf
- text/rtf
For plain text files with a MIME type not on the list, you can try specifying one of the above MIME types manually.