Gemma instruction-tuned (IT) models are trained with a specific formatter that
annotates all instruction tuning examples with extra information, both at
training and inference time. The formatter has two purposes:
Indicating roles in a conversation, such as the system, user, or
assistant roles.
Delineating turns in a conversation, especially in a multi-turn
conversation.
Below, we specify the control tokens used by Gemma and their use cases. Note
that the control tokens are reserved in and specific to our tokenizer.
Token to indicate a user turn: user
Token to indicate a model turn: model
Token to indicate the beginning of dialogue turn: <start_of_turn>
Token to indicate the end of dialogue turn: <end_of_turn>
Here's an example dialogue:
<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
Gemma<end_of_turn>
<start_of_turn>model
Gemma who?<end_of_turn>
The token "<end_of_turn>\n" is the turn separator, and the prompt prefix is
"<start_of_turn>model\n". This means that if you'd like to prompt the model
with a question like, "What is Cramer's Rule?", you should instead feed the
model as follows:
"<start_of_turn>user
What is Cramer's Rule?<end_of_turn>
<start_of_turn>model"
Note that if you want to finetune the pretrained Gemma models with your own
data, you can use any such schema for control tokens, as long as it's consistent
between your training and inference use cases.
System instructions
Gemma's instruction-tuned models are designed to work with only two roles:
user and model. Therefore, the system role or a system turn is not
supported.
Instead of using a separate system role, provide system-level instructions
directly within the initial user prompt. The model instruction following
capabilities allow Gemma to interpret the instructions effectively. For example:
<start_of_turn>user
Only reply like a pirate.
What is the answer to life the universe and everything?<end_of_turn>
<start_of_turn>model
Arrr, 'tis 42,<end_of_turn>
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-21 UTC."],[],[],null,["# Gemma formatting and system instructions\n\nGemma instruction-tuned (IT) models are trained with a specific *formatter* that\nannotates all instruction tuning examples with extra information, both at\ntraining and inference time. The formatter has two purposes:\n\n1. Indicating roles in a conversation, such as the *system* , *user* , or *assistant* roles.\n2. Delineating turns in a conversation, especially in a multi-turn conversation.\n\nBelow, we specify the control tokens used by Gemma and their use cases. Note\nthat the control tokens are reserved in and specific to our tokenizer.\n\n- Token to indicate a user turn: `user`\n- Token to indicate a model turn: `model`\n- Token to indicate the beginning of dialogue turn: `\u003cstart_of_turn\u003e`\n- Token to indicate the end of dialogue turn: `\u003cend_of_turn\u003e`\n\nHere's an example dialogue: \n\n \u003cstart_of_turn\u003euser\n knock knock\u003cend_of_turn\u003e\n \u003cstart_of_turn\u003emodel\n who is there\u003cend_of_turn\u003e\n \u003cstart_of_turn\u003euser\n Gemma\u003cend_of_turn\u003e\n \u003cstart_of_turn\u003emodel\n Gemma who?\u003cend_of_turn\u003e\n\nThe token `\"\u003cend_of_turn\u003e\\n\"` is the turn separator, and the prompt prefix is\n`\"\u003cstart_of_turn\u003emodel\\n\"`. This means that if you'd like to prompt the model\nwith a question like, \"What is Cramer's Rule?\", you should instead feed the\nmodel as follows: \n\n \"\u003cstart_of_turn\u003euser\n What is Cramer's Rule?\u003cend_of_turn\u003e\n \u003cstart_of_turn\u003emodel\"\n\nNote that if you want to finetune the pretrained Gemma models with your own\ndata, you can use any such schema for control tokens, as long as it's consistent\nbetween your training and inference use cases.\n\nSystem instructions\n-------------------\n\nGemma's instruction-tuned models are designed to work with only two roles:\n`user` and `model`. Therefore, the `system` role or a system turn is not\nsupported.\n\nInstead of using a separate system role, provide system-level instructions\ndirectly within the initial user prompt. The model instruction following\ncapabilities allow Gemma to interpret the instructions effectively. For example: \n\n \u003cstart_of_turn\u003euser\n Only reply like a pirate.\n\n What is the answer to life the universe and everything?\u003cend_of_turn\u003e\n \u003cstart_of_turn\u003emodel\n Arrr, 'tis 42,\u003cend_of_turn\u003e"]]