Chat with Gemma: basic and multi-turn conversations

Gemma instruction-tuned (IT) models are designed to handle conversational interactions, from single question-and-answer exchanges to extended multi-turn dialogues. This guide explains how to format prompts for chat with Gemma and how to build multi-turn conversations.

Prompt format

Gemma IT models use special control tokens to delineate conversation turns. These tokens are required when sending prompts directly to the tokenizer, but are typically applied automatically by chat-aware frameworks.

Single-turn prompt

A single-turn prompt consists of one user message and a model response marker:

<start_of_turn>user
What is the speed of light?<end_of_turn>
<start_of_turn>model

Multi-turn prompt

Multi-turn conversations chain multiple exchanges. Each turn is wrapped in the same control tokens:

<start_of_turn>user
What is the speed of light?<end_of_turn>
<start_of_turn>model
The speed of light in a vacuum is approximately 299,792,458 meters per second.<end_of_turn>
<start_of_turn>user
How long does it take light to reach Earth from the Sun?<end_of_turn>
<start_of_turn>model

The model generates a response for the final <start_of_turn>model turn.

System instructions

Gemma's instruction-tuned models are designed to work with only two roles: user and model. Therefore, the system role or a system turn is not supported.

Instead of using a separate system role, provide system-level instructions directly within the initial user prompt. The model instruction following capabilities allow Gemma to interpret the instructions effectively. For example:

Gemma 3 and later models support system instructions that define the model's behavior, persona, or constraints for the entire conversation. Place the system instruction before the first user turn:

<start_of_turn>user
Only reply like a pirate.

What is the answer to life the universe and everything?<end_of_turn>
<start_of_turn>model
Arrr, 'tis 42,<end_of_turn>

For more details, see Prompt and system instructions.

Framework support

Most frameworks handle chat formatting automatically through their chat template or conversation API:

Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")

messages = [
    {"role": "user", "content": "What is machine learning?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Ollama

ollama run gemma3:1b "What is machine learning?"

For multi-turn chat, use the interactive mode:

ollama run gemma3:1b
>>> What is machine learning?
...
>>> How is it different from deep learning?

OpenAI-compatible APIs

When using frameworks that expose an OpenAI-compatible API (such as vLLM, llama.cpp, or LM Studio), pass messages using the standard messages format:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

response = client.chat.completions.create(
    model="google/gemma-3-1b-it",
    messages=[
        {"role": "user", "content": "What is machine learning?"},
    ],
)
print(response.choices[0].message.content)

Next steps