Gemini is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input. This guide provides information about each model variant to help you decide which is the best fit for your use case.
Safety and intended use
Generative artificial intelligence models are powerful tools, but they are not without their limitations. Their versatility and applicability can sometimes lead to unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing, and rigorous manual evaluation are essential to limit the risk of harm from such outputs. See the safety guidance for additional safe use suggestions.
The models provided by the Gemini API can be used for a wide variety of generative AI and natural language processing (NLP) applications. Use of these functions is only available through the Gemini API or the Google AI Studio web app. Your use of Gemini API is also subject to the Generative AI Prohibited Use Policy and the Gemini API terms of service.
Model variants
The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available:
Model variant | Input(s) | Output | Optimized for |
---|---|---|---|
Gemini 1.5 Pro (Preview) | Audio, images, and text | Text | Reasoning tasks including (but not limited to) code and text generation, text editing, problem solving, data extraction and generation |
Gemini 1.5 Flash (Preview) | Audio, images, and text | Text | Fast and versatile performance across a diverse variety of tasks |
Gemini 1.0 Pro | Text | Text | Natural language tasks, multi-turn text and code chat, and code generation |
Gemini 1.0 Pro Vision | Images and text | Text | Performance optimized for visual-related tasks, like generating image descriptions or identifying objects in images |
Text Embedding | Text | Text embeddings | Generate elastic text embeddings with up to 768 dimensions for text up to 2,048 tokens |
Embedding | Text | Text embeddings | Generate text embeddings with 768 dimensions for text up to 2,048 tokens |
AQA | Text | Text | Perform Attributed Question-Answering–related tasks over provided text |
The following table describes the attributes of the Gemini models which are common to all model variants:
Attribute | Description |
---|---|
Training data | Gemini's knowledge cutoff is early 2023. Knowledge about events after that time is limited. |
Supported languages | See available languages |
Configurable model parameters |
|
See the model parameters section of the generative models guide for information about each of these parameters.
Gemini 1.5 Pro (Preview)
Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks such as:
- Code generation
- Text generation
- Text editing
- Problem solving
- Recommendations generation
- Information extraction
- Data extraction or generation
- Creation of AI agents
1.5 Pro can process large amounts of data at once, including 1 hour of video, 9.5 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.
1.5 Pro is capable of handling zero-, one-, and few-shot learning tasks.
Model details
Property | Description |
---|---|
Model code | models/gemini-1.5-pro-latest |
Inputs | Audio, images, and text |
Output | Text |
Supported generation methods |
generateContent
|
Input token limit[**] | 1,048,576 |
Output token limit[**] | 8,192 |
Maximum number of images per prompt | 3,600 |
Maximum video length | 1 hour |
Maximum audio length | Approximately 9.5 hours |
Maximum number of audio files per prompt | 1 |
Model safety | Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details. |
Rate limits[*] |
|
System instructions | Supported |
JSON mode | Supported |
Latest version | gemini-1.5-pro-latest |
Latest stable version | gemini-1.5-pro |
Latest update | April 2024 |
Gemini 1.5 Flash (Preview)
Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks.
Model details
Property | Description |
---|---|
Model code | gemini-1.5-flash |
Input(s) | Audio, images, and text |
Output | Text |
Supported generation methods |
generateContent
|
Input token limit[**] | 1,048,576 |
Output token limit[**] | 8,192 |
Maximum number of images per prompt | 3,600 |
Maximum video length | 1 hour |
Maximum audio length | Approximately 9.5 hours |
Maximum number of audio files per prompt | 1 |
Model safety | Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details. |
Rate limits[*] |
|
System instructions | Supported |
JSON mode | Supported |
Latest version | gemini-1.5-flash-latest |
Latest stable version | gemini-1.5-flash |
Gemini 1.0 Pro
Gemini 1.0 Pro is an NLP model that handles tasks like multi-turn text and code chat, and code generation.
1.0 Pro is capable of handling zero-, one-, and few-shot learning tasks.
Model details
Property | Description |
---|---|
Model code | models/gemini-pro |
Input | Text |
Output | Text |
Supported generation methods |
generate_content
generateContent
|
Rate limits[*] |
|
System instructions | Unsupported |
JSON mode | Unsupported |
Latest version | gemini-1.0-pro-latest |
Latest stable version | gemini-1.0-pro |
Stable versions | gemini-1.0-pro-001 |
Latest update | February 2024 |
Gemini 1.0 Pro Vision
Gemini 1.0 Pro Vision is a performance-optimized multimodal model that can perform visual-related tasks. For example, 1.0 Pro Vision can generate image descriptions, identify objects present in images, provide information about places or objects present in images, and more.
1.0 Pro Vision is capable of handling zero-, one-, and few-shot tasks.
Model details
Property | Description |
---|---|
Model code | models/gemini-pro-vision |
Inputs | Text and images |
Output | Text |
Supported generation methods |
generate_content
generateContent
|
Input token limit[*] | 12,288 |
Output token limit[*] | 4,096 |
Maximum image size | No limit |
Maximum number of images per prompt | 16 |
Maximum video length | 2 minutes |
Maximum number of videos per prompt | 1 |
Model safety | Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details. |
Rate limit[*] | 60 requests per minute |
Latest version | gemini-1.0-pro-vision-latest |
Latest stable version | gemini-1.0-pro-vision |
Latest update | December 2023 |
Text Embedding and Embedding
Text Embedding
You can use the Text Embedding model to generate text embeddings for input text. For more information on the Text Embedding model, visit the Generative AI on Vertex AI documentation about text embeddings.
The Text Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Text Embedding offers elastic embedding sizes under 768. You can use elastic embeddings to generate smaller output dimensions and potentially save computing and storage costs with minor performance loss.
Model details
Property | Description |
---|---|
Model code |
models/text-embedding-004
(text-embedding-preview-0409 in
Vertex AI)
|
Input | Text |
Output | Text embeddings |
Input token limit | 2,048 |
Output dimension size | 768 |
Supported generation methods |
embed_content
embedContent
|
Model safety | No adjustable safety settings. |
Rate limit[*] | 1,500 requests per minute |
Latest update | April 2024 |
Embedding
You can use the Embedding model to generate text embeddings for input text.
The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens.
Embedding model details
Property | Description |
---|---|
Model code | models/embedding-001 |
Input | Text |
Output | Text embeddings |
Input token limit | 2,048 |
Output dimension size | 768 |
Supported generation methods |
embed_content
embedContent
|
Model safety | No adjustable safety settings. |
Rate limit[*] | 1,500 requests per minute |
Latest update | December 2023 |
AQA
You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability.
Model details
Property | Description |
---|---|
Model code | models/aqa |
Input | Text |
Output | Text |
Supported generation methods |
GenerateAnswerRequest
generateAnswer
|
Supported languages | English |
Input token limit[**] | 7,168 |
Output token limit[**] | 1,024 |
Model safety | Automatically applied safety settings which are adjustable by developers. See our page on safety settings for details. |
Rate limit[*] | 60 requests per minute |
Latest update | December 2023 |
See the examples to explore the capabilities of these model variations.
[*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words.
[**] RPM: Requests per minute
TPM: Tokens per minute
RPD: Requests per day
TPD: Tokens per day
Due to capacity limitations, specified maximum rate limits are not
guaranteed.
Model version name patterns
Gemini models are available in either preview or stable versions. In your code, you can use one of the following model name formats to specify which model and version you want to use.
Latest: Points to the cutting-edge version of the model for a specified generation and variation. The underlying model is updated regularly and might be a preview version. Only exploratory testing apps and prototypes should use this alias.
To specify the latest version, use the following pattern:
<model>-<generation>-<variation>-latest
. For example,gemini-1.0-pro-latest
.Latest stable: Points to the most recent stable version released for the specified model generation and variation.
To specify the latest stable version, use the following pattern:
<model>-<generation>-<variation>
. For example,gemini-1.0-pro
.Stable: Points to a specific stable model. Stable models don't change. Most production apps should use a specific stable model.
To specify a stable version, use the following pattern:
<model>-<generation>-<variation>-<version>
. For example,gemini-1.0-pro-001
.