commercial use, allowing you to tune and deploy them in your own projects and applications.
Gemma is a family of generative artificial intelligence (AI) models and you can use them in a wide variety of generation tasks, including question answering, summarization, and reasoning. Gemma models are provided with open weights and permit responsibleThe Gemma 3 release includes the following key features. Try it in AI Studio:
- Image and text input: Multimodal capabilities let you input images and text to understand and analyze visual data. Start building
- 128K token context: 16x larger input context for analyzing more data and solving more complex problems.
- Wide language support: Work in your language or expand your AI application's language capabilities with support for over 140 languages. Start building
- Developer friendly model sizes: Choose a model size (1B, 4B, 12B, 27B) and precision level that works best for your task and compute resources.
You can download Gemma 3 models from Kaggle and Hugging Face. For more technical details on Gemma 3, see the Model Card and Technical Report. Earlier versions of Gemma core models are also available for download. For more information, see Previous Gemma models.
Try Gemma 3 Get it on Kaggle Get it on Hugging Face
Multimodal image and text input
You can tackle more complex analysis and generation tasks with Gemma 3 with its ability to handle image and text data. You can use the model to interpret image data, identify objects, extract text data, and complete many other visual input to text output tasks. Start building
128K token context window
Gemma 3 models can handle prompt inputs up to 128K tokens, a 16x larger context window than previous Gemma models. The large number of tokens means you can process several, multi page articles, larger single articles, or hundreds of images in a single prompt.
Wide language support
Work in your own language with built-in support for over 140 languages. Gemma 3 is trained to support a large number of languages compared to previous Gemma versions, letting you take on more visual and text tasks in the languages your customers use. Start building
Parameter sizes and quantization
Gemma 3 models are available in 4 parameter sizes at 5 precision levels, from full precision at 32-bit to the lowest precision at 4-bit. The different sizes and precisions represent a set of trade-offs for your AI application. Models with higher parameters and bit counts (higher precision) are generally more capable, but are more expensive to run in terms of processing cycles, memory cost and power consumption. Models with lower parameters and bit counts (lower precision) have less capabilities, but may be sufficient for your AI task. The following table details the approximate GPU or TPU memory requirements for running inferences with each size of the Gemma 3 model versions.
Parameters | Full 32bit | BF16 (16-bit) | SFP8 (8-bit) |
Q4_0 (4-bit) |
INT4 (4-bit) |
---|---|---|---|---|---|
Gemma 3 1B (text only) | 4 GB | 1.5 GB | 1.1 GB | 892 MB | 861 MB |
Gemma 3 4B | 16 GB | 6.4 GB | 4.4 GB | 3.4 GB | 3.2 GB |
Gemma 3 12B | 48 GB | 20 GB | 12.2 GB | 8.7 GB | 8.2 GB |
Gemma 3 27B | 108 GB | 46.4 GB | 29.1 GB | 21 GB | 19.9 GB |
Table 1. Approximate GPU or TPU memory required to load Gemma 3 models based on parameter count and quantization level (bit depth).
Memory consumption increases based on the total number of tokens required for the prompt you run. The larger the number of tokens required to process your prompt, the higher the memory required, which is in addition to the memory required to load the model.
Previous Gemma models
You can work with previous generations of Gemma models, which are also available from Kaggle and Hugging Face. For more technical details about previous Gemma models, see the following model card pages:
- Gemma 2 Model Card
- Gemma 1 Model Card
Ready to start building? Get started with Gemma models!