Gemma Open Models
A family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models
Try Gemma 2
Redesigned for outsized performance and unmatched efficiency, Gemma 2 optimizes for blazing-fast inference on diverse hardware.
5-shot
MMLU
The MMLU benchmark is a test that measures the breadth of knowledge and problem-solving ability acquired by large language models during pretraining.
25-shot
ARC-C
The ARC-c benchmark is a more focused subset of the ARC-e dataset, containing only questions answered incorrectly by common (retrieval-base and word co-occurrence) algorithms.
5-shot
GSM8K
The GSM8K benchmark tests a language model's ability to solve grade-school-level math problems that frequently require multiple steps of reasoning.
3-5-shot
AGIEval
The AGIEval benchmark tests a language model's general intelligence by using questions derived from real-world exams designed to assess human intellectual abilities.
3-shot, CoT
BBH
The BBH (BIG-Bench Hard) benchmark focuses on tasks deemed beyond the abilities of current language models, testing their limits across various reasoning and understanding domains.
3-shot, F1
DROP
DROP is a reading comprehension benchmark that requires discrete reasoning over paragraphs.
5-shot
Winogrande
The Winogrande benchmark tests a language model's ability to resolve ambiguous fill-in-the-blank tasks with binary options, requiring generalized commonsense reasoning.
10-shot
HellaSwag
The HellaSwag benchmark challenges a language model's ability to understand and apply common sense reasoning by selecting the most logical ending to a story.
4-shot
MATH
MATH evaluates a language model's ability to solve complex mathematical word problems, requiring reasoning, multi-step problem-solving, and the understanding of mathematical concepts.
0-shot
ARC-e
The ARC-e benchmark tests a language model's advanced question-answering skills with genuine grade-school level, multiple-choice science questions.
0-shot
PIQA
The PIQA benchmark tests a language model's ability to understand and apply physical commonsense knowledge by answering questions about everyday physical interactions.
0-shot
SIQA
The SIQA benchmark evaluates a language model's understanding of social interactions and social common sense by asking questions about people’s actions and their social implications.
0-shot
Boolq
The BoolQ benchmark tests a language model's ability to answer naturally occurring yes/no questions, testing the models ability to do real-world natural language inference tasks.
5-shot
TriviaQA
The TriviaQA benchmark tests reading comprehension skills with question-answer-evidence triples.
5-shot
NQ
The NQ (Natural Questions) benchmark tests a language model's ability to find and comprehend answers within entire Wikipedia articles, simulating real-world question-answering scenarios.
pass@1
HumanEval
The HumanEval benchmark tests a language model's code generation abilities by evaluating whether its solutions pass functional unit tests for programming problems.
3-shot
MBPP
The MBPP benchmark tests a language model's ability to solve basic Python programming problems, focusing on fundamental programming concepts and standard library usage.
100%
75%
50%
25%
0%
100%
75%
50%
25%
0%
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
LLAMA 3
8B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
Gemma 1
2.5B
Gemma 2
2.6B
Mistral
7B
Gemma 1
7B
Gemma 2
9B
Gemma 2
27B
*These are the benchmarks for the pre-trained models, see the technical report for details on performance with other methodologies.
Research Models
Discover the extended Gemma model family
Explore the Gemmaverse
A vast ecosystem of community-created Gemma models and tools, ready to power and inspire your innovation
Build
Start building with Gemma
Deploy models
Choose your deployment target
Mobile
Deploy on-device with Google AI Edge
Deploy directly to devices for low-latency, offline functionality. Ideal for applications requiring real-time responsiveness and privacy, such as mobile apps, IoT devices, and embedded systems.
Web
Integrate seamlessly into web applications
Empower your websites and web services with advanced AI capabilities, enabling interactive features, personalized content, and intelligent automation.
Cloud
Scale effortlessly with cloud infrastructure
Leverage the scalability and flexibility of the cloud to handle large-scale deployments, demanding workloads, and complex AI applications.