Gemma formatting and system instructions

Formatting for instruction tuning

Instruction-tuned (IT) models are trained with a specific formatter that annotates all instruction tuning examples with extra information, both at training and inference time. The formatter has two purposes:

  1. Indicating roles in a conversation, such as the system, user, or assistant roles.
  2. Delineating turns in a conversation, especially in a multi-turn conversation.

Below, we specify the control tokens used by Gemma and their use cases. Note that the control tokens are reserved in and specific to our tokenizer.

  • Token to indicate a user turn: user
  • Token to indicate a model turn: model
  • Token to indicate the beginning of dialogue turn: <start_of_turn>
  • Token to indicate the end of dialogue turn: <end_of_turn>

Here's an example dialogue:

<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
Gemma<end_of_turn>
<start_of_turn>model
Gemma who?<end_of_turn>

The token "<end_of_turn>\n" is the turn separator, and the prompt prefix is "<start_of_turn>model\n". This means that if you'd like to prompt the model with a question like, "What is Cramer's Rule?", you should instead feed the model as follows:

"<start_of_turn>user
What is Cramer's Rule?<end_of_turn>
<start_of_turn>model"

Note that if you want to finetune the base pretrained Gemma models with your own data, you can use any such schema for control tokens, as long as it's consistent between your training and inference use cases.

System instructions

For both supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), the models was not trained with system instructions. As a result, the only relevant formatting tokens for Gemma are <start_of_turn>, <end_of_turn>, user, and model. For example:

<start_of_turn>user
What is Cramer's Rule?<end_of_turn>
<start_of_turn>model
Cramer's Rule is ...<end_of_turn>

Formatting for FIM tasks

The CodeGemma 2B and 7B variants are specially tuned for code infilling tasks.

Specifically, they are trained on four formatting control tokens that you can use to help construct model prompts for fill-in-the-middle (FIM) coding tasks.

ContextToken
FIM prefix <|fim_prefix|>
FIM suffix <|fim_suffix|>
FIM middle <|fim_middle|>
File separator <|file_separator|>

Use the FIM tokens to define the cursor location and surrounding context around it for CodeGemma to perform code infilling. Use the file separator token for multi-file contexts.

Example - Construct a FIM prompt

This section reuses the example in the Keras CodeGemma quickstart to show you how to construct a prompt for FIM tasks.

Consider the following code:

import | # Line 1
if __name__ == '__main__': # Line 2
   sys.exit(0) # Line 3

The | indicates the location of the cursor which is where the code needs to be completed. Note that there is a space before the cursor and that lines 1 and 2 have carriage returns at the end.

The prefix is then,

import

with one space at the end.

The suffix is:


if __name__ == '__main__':
   sys.exit(0)

with a new line at the start.

The prompt should be constructed as:

<|fim_prefix|>import <|fim_suffix|>
if __name == '__main__':
   sys.exit(0)<|fim_middle|>

Note that:

  • There should be no extra white spaces between any FIM tokens and the prefix and suffix
  • The FIM middle token should be at the end to prime the model to continue filling in
  • The prefix or the suffix could be empty depending on where the cursor currently is in the file, or how much context you want to provide the model with

Understanding model output

The model response for the example above would be:

<|fim_prefix|>import <|fim_suffix|>⏎
if __name__ == "__main__":\n    sys.exit(0)<|fim_middle|>sys\n<|file_separator|>

The model repeats the input prompt and provides sys as the code completion.

When using the CodeGemma models for FIM tasks, stream response tokens and use the FIM or file separator tokens as delimiters to stop streaming and get the resulting code completion.