Gemma Open Models

A family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models

Responsible by design icon

Responsible by design

Incorporating comprehensive safety measures, these models help ensure responsible and trustworthy AI solutions through curated datasets and rigorous tuning.

Unmatched performance icon

Unmatched performance at size

Gemma models achieve exceptional benchmark results at its 2B, 7B, 9B, and 27B sizes, even outperforming some larger open models.

Framework flexible

Flexible deployment

Deploy seamlessly to mobile, web, and cloud using Keras, JAX, MediaPipe, PyTorch, Hugging Face, and more.

Try Gemma 2

Redesigned for outsized performance and unmatched efficiency, Gemma 2 optimizes for blazing-fast inference on diverse hardware.

5-shot

MMLU

The MMLU benchmark is a test that measures the breadth of knowledge and problem-solving ability acquired by large language models during pretraining.

25-shot

ARC-C

The ARC-c benchmark is a more focused subset of the ARC-e dataset, containing only questions answered incorrectly by common (retrieval-base and word co-occurrence) algorithms.

5-shot

GSM8K

The GSM8K benchmark tests a language model's ability to solve grade-school-level math problems that frequently require multiple steps of reasoning.

3-5-shot

AGIEval

The AGIEval benchmark tests a language model's general intelligence by using questions derived from real-world exams designed to assess human intellectual abilities.

3-shot, CoT

BBH

The BBH (BIG-Bench Hard) benchmark focuses on tasks deemed beyond the abilities of current language models, testing their limits across various reasoning and understanding domains.

3-shot, F1

DROP

DROP is a reading comprehension benchmark that requires discrete reasoning over paragraphs.

5-shot

Winogrande

The Winogrande benchmark tests a language model's ability to resolve ambiguous fill-in-the-blank tasks with binary options, requiring generalized commonsense reasoning.

10-shot

HellaSwag

The HellaSwag benchmark challenges a language model's ability to understand and apply common sense reasoning by selecting the most logical ending to a story.

4-shot

MATH

MATH evaluates a language model's ability to solve complex mathematical word problems, requiring reasoning, multi-step problem-solving, and the understanding of mathematical concepts.

0-shot

ARC-e

The ARC-e benchmark tests a language model's advanced question-answering skills with genuine grade-school level, multiple-choice science questions.

0-shot

PIQA

The PIQA benchmark tests a language model's ability to understand and apply physical commonsense knowledge by answering questions about everyday physical interactions.

0-shot

SIQA

The SIQA benchmark evaluates a language model's understanding of social interactions and social common sense by asking questions about people’s actions and their social implications.

0-shot

Boolq

The BoolQ benchmark tests a language model's ability to answer naturally occurring yes/no questions, testing the models ability to do real-world natural language inference tasks.

5-shot

TriviaQA

The TriviaQA benchmark tests reading comprehension skills with question-answer-evidence triples.

5-shot

NQ

The NQ (Natural Questions) benchmark tests a language model's ability to find and comprehend answers within entire Wikipedia articles, simulating real-world question-answering scenarios.

pass@1

HumanEval

The HumanEval benchmark tests a language model's code generation abilities by evaluating whether its solutions pass functional unit tests for programming problems.

3-shot

MBPP

The MBPP benchmark tests a language model's ability to solve basic Python programming problems, focusing on fundamental programming concepts and standard library usage.

100%

75%

50%

25%

0%

100%

75%

50%

25%

0%

Gemma 1

2.5B

42.3

Gemma 2

2.6B

51.3

Mistral

7B

62.5

LLAMA 3

8B

66.6

Gemma 1

7B

64.4

Gemma 2

9B

71.3

Gemma 2

27B

75.2

Gemma 1

2.5B

48.5

Gemma 2

2.6B

55.4

Mistral

7B

60.5

LLAMA 3

8B

59.2

Gemma 1

7B

61.1

Gemma 2

9B

68.4

Gemma 2

27B

71.4

Gemma 1

2.5B

15.1

Gemma 2

2.6B

23.9

Mistral

7B

39.6

LLAMA 3

8B

45.7

Gemma 1

7B

51.8

Gemma 2

9B

68.6

Gemma 2

27B

74.0

Gemma 1

2.5B

24.2

Gemma 2

2.6B

30.6

Mistral

7B

44.0

LLAMA 3

8B

45.9

Gemma 1

7B

44.9

Gemma 2

9B

52.8

Gemma 2

27B

55.1

Gemma 1

2.5B

35.2

Gemma 2

2.6B

41.9

Mistral

7B

56.0

LLAMA 3

8B

61.1

Gemma 1

7B

59.0

Gemma 2

9B

68.2

Gemma 2

27B

74.9

Gemma 1

2.5B

48.5

Gemma 2

2.6B

52.0

Mistral

7B

63.8

LLAMA 3

8B

58.4

Gemma 1

7B

56.3

Gemma 2

9B

69.4

Gemma 2

27B

74.2

Gemma 1

2.5B

66.8

Gemma 2

2.6B

70.9

Mistral

7B

78.5

LLAMA 3

8B

76.1

Gemma 1

7B

79.0

Gemma 2

9B

80.6

Gemma 2

27B

83.7

Gemma 1

2.5B

71.7

Gemma 2

2.6B

73.0

Mistral

7B

83.0

LLAMA 3

8B

82.0

Gemma 1

7B

82.3

Gemma 2

9B

81.9

Gemma 2

27B

86.4

Gemma 1

2.5B

11.8

Gemma 2

2.6B

15.0

Mistral

7B

12.7

Gemma 1

7B

24.3

Gemma 2

9B

36.6

Gemma 2

27B

42.3

Gemma 1

2.5B

73.2

Gemma 2

2.6B

80.1

Mistral

7B

80.5

Gemma 1

7B

81.5

Gemma 2

9B

88.0

Gemma 2

27B

88.6

Gemma 1

2.5B

77.3

Gemma 2

2.6B

77.8

Mistral

7B

82.2

Gemma 1

7B

81.2

Gemma 2

9B

81.7

Gemma 2

27B

83.2

Gemma 1

2.5B

49.7

Gemma 2

2.6B

51.9

Mistral

7B

47.0

Gemma 1

7B

51.8

Gemma 2

9B

53.4

Gemma 2

27B

53.7

Gemma 1

2.5B

69.4

Gemma 2

2.6B

72.5

Mistral

7B

83.2

Gemma 1

7B

83.2

Gemma 2

9B

84.2

Gemma 2

27B

84.8

Gemma 1

2.5B

53.2

Gemma 2

2.6B

59.4

Mistral

7B

62.5

Gemma 1

7B

63.4

Gemma 2

9B

76.6

Gemma 2

27B

83.7

Gemma 1

2.5B

12.5

Gemma 2

2.6B

16.7

Mistral

7B

23.2

Gemma 1

7B

23.0

Gemma 2

9B

29.2

Gemma 2

27B

34.5

Gemma 1

2.5B

22.0

Gemma 2

2.6B

17.7

Mistral

7B

26.2

Gemma 1

7B

32.3

Gemma 2

9B

40.2

Gemma 2

27B

51.8

Gemma 1

2.5B

29.2

Gemma 2

2.6B

29.6

Mistral

7B

40.2

Gemma 1

7B

44.4

Gemma 2

9B

52.4

Gemma 2

27B

62.6

*These are the benchmarks for the pre-trained models, see the technical report for details on performance with other methodologies.

PaliGemma 2 New

PaliGemma 2 brings easily fine-tunable vision capabilities to the Gemma 2 language models, enabling a wide range of applications that combine text and image understanding.

DataGemma

DataGemma are the first open models designed to connect LLMs with extensive real-world data drawn from Google's Data Commons.

Gemma Scope

Gemma Scope offers researchers unprecedented transparency into the decision-making processes of our Gemma 2 models.

Deploy models

Choose your deployment target

Deploy mobile iconMobile

Deploy on-device with Google AI Edge

Deploy directly to devices for low-latency, offline functionality. Ideal for applications requiring real-time responsiveness and privacy, such as mobile apps, IoT devices, and embedded systems.

Web iconWeb

Integrate seamlessly into web applications

Empower your websites and web services with advanced AI capabilities, enabling interactive features, personalized content, and intelligent automation.

Cloud iconCloud

Scale effortlessly with cloud infrastructure

Leverage the scalability and flexibility of the cloud to handle large-scale deployments, demanding workloads, and complex AI applications.

Unlocking global communication

Join our global Kaggle competition. Create Gemma model variants for a specific language or unique cultural aspect