Gemma 3 released with 128K context, image input, and multilingual support! Learn more

Gemma 3 model overview

Gemma is a family of generative artificial intelligence (AI) models and you can use them in a wide variety of generation tasks, including question answering, summarization, and reasoning. Gemma models are provided with open weights and permit responsible commercial use, allowing you to tune and deploy them in your own projects and applications.

The Gemma 3 release includes the following key features. Try it in AI Studio:

Image and text input: Multimodal capabilities let you input images and text to understand and analyze visual data. Start building
128K token context: 16x larger input context for analyzing more data and solving more complex problems.
Wide language support: Work in your language or expand your AI application's language capabilities with support for over 140 languages. Start building
Developer friendly model sizes: Choose a model size (1B, 4B, 12B, 27B) and precision level that works best for your task and compute resources.

You can download Gemma 3 models from Kaggle and Hugging Face. For more technical details on Gemma 3, see the Model Card and Technical Report. Earlier versions of Gemma core models are also available for download. For more information, see Previous Gemma models.

Try Gemma 3 Get it on Kaggle Get it on Hugging Face

Multimodal image and text input

You can tackle more complex analysis and generation tasks with Gemma 3 with its ability to handle image and text data. You can use the model to interpret image data, identify objects, extract text data, and complete many other visual input to text output tasks. Start building

128K token context window

Gemma 3 models can handle prompt inputs up to 128K tokens, a 16x larger context window than previous Gemma models. The large number of tokens means you can process several, multi page articles, larger single articles, or hundreds of images in a single prompt.

Wide language support

Work in your own language with built-in support for over 140 languages. Gemma 3 is trained to support a large number of languages compared to previous Gemma versions, letting you take on more visual and text tasks in the languages your customers use. Start building

Parameter sizes and quantization

Gemma 3 models are available in 4 parameter sizes at 5 precision levels, from full precision at 32-bit to the lowest precision at 4-bit. The different sizes and precisions represent a set of trade-offs for your AI application. Models with higher parameters and bit counts (higher precision) are generally more capable, but are more expensive to run in terms of processing cycles, memory cost and power consumption. Models with lower parameters and bit counts (lower precision) have less capabilities, but may be sufficient for your AI task. The following table details the approximate GPU or TPU memory requirements for running inferences with each size of the Gemma 3 model versions.

Parameters	Full 32bit	BF16 (16-bit)	SFP8 (8-bit)	Q4_0 (4-bit)	INT4 (4-bit)
Gemma 3 1B (text only)	4 GB	1.5 GB	1.1 GB	892 MB	861 MB
Gemma 3 4B	16 GB	6.4 GB	4.4 GB	3.4 GB	3.2 GB
Gemma 3 12B	48 GB	20 GB	12.2 GB	8.7 GB	8.2 GB
Gemma 3 27B	108 GB	46.4 GB	29.1 GB	21 GB	19.9 GB

Table 1. Approximate GPU or TPU memory required to load Gemma 3 models based on parameter count and quantization level (bit depth).

Memory consumption increases based on the total number of tokens required for the prompt you run. The larger the number of tokens required to process your prompt, the higher the memory required, which is in addition to the memory required to load the model.

Previous Gemma models

You can work with previous generations of Gemma models, which are also available from Kaggle and Hugging Face. For more technical details about previous Gemma models, see the following model card pages:

Gemma 2 Model Card
Gemma 1 Model Card

Ready to start building? Get started with Gemma models!