##### Copyright 2024 Google LLC.

### Licensed under the Apache License, Version 2.0 (the "License");

View on ai.google.dev | Run in Google Colab | Open in Vertex AI | View source on GitHub |

Large Language Models (LLMs) such as Gemma excel at generating informative responses, making them ideal for building virtual assistants and chatbots.

Conventionally, LLMs operate in a stateless manner, meaning they lack an inherent memory to store past conversations. Each prompt or question is processed independently, disregarding prior interactions. However, a crucial aspect of natural conversation is the ability to retain context from prior interactions. To overcome this limitation and enable LLMs to maintain conversation context, they must be explicitly provided with relevant information such as the conversation history (or pertinent parts) into each new prompt presented to the LLM.

This tutorial shows you how to develop a chatbot using the instruction-tuned model variant of Gemma.

## Setup

### Gemma setup

To complete this tutorial, you'll first need to complete the setup instructions at Gemma setup. The Gemma setup instructions show you how to do the following:

- Get access to Gemma on kaggle.com.
- Select a Colab runtime with sufficient resources to run the Gemma 2B model.
- Generate and configure a Kaggle username and API key.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.

### Set environment variables

Set environment variables for `KAGGLE_USERNAME`

and `KAGGLE_KEY`

.

```
import os
from google.colab import userdata
# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
```

### Install dependencies

Install Keras and KerasNLP.

`# Install Keras 3 last. See https://keras.io/getting_started/ for more details.`

`pip install -q tensorflow-cpu`

`pip install -q -U keras-nlp tensorflow-hub`

`pip install -q -U keras>=3`

`pip install -q -U tensorflow-text`

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 214.0/214.0 MB 5.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 86.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 70.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 88.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 64.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 311.2/311.2 kB 29.6 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.3.3 which is incompatible. tensorflow 2.15.0 requires ml-dtypes~=0.2.0, but you have ml-dtypes 0.3.2 which is incompatible. tensorflow 2.15.0 requires tensorboard<2.16,>=2.15, but you have tensorboard 2.16.2 which is incompatible. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 515.3/515.3 kB 5.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 950.8/950.8 kB 10.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 21.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 36.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 49.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 49.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 67.1 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-cpu 2.16.1 requires keras>=3.0.0, but you have keras 2.15.0 which is incompatible. tensorflow-cpu 2.16.1 requires ml-dtypes~=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible. tensorflow-cpu 2.16.1 requires tensorboard<2.17,>=2.16, but you have tensorboard 2.15.2 which is incompatible. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.3.3 which is incompatible. tensorflow-cpu 2.16.1 requires ml-dtypes~=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible. tensorflow-cpu 2.16.1 requires tensorboard<2.17,>=2.16, but you have tensorboard 2.15.2 which is incompatible. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 589.8/589.8 MB 2.7 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tf-keras 2.15.1 requires tensorflow<2.16,>=2.15, but you have tensorflow 2.16.1 which is incompatible.

### Select a backend

Keras is a high-level, multi-framework deep learning API designed for simplicity and ease of use. Keras 3 lets you choose the backend: TensorFlow, JAX, or PyTorch. All three will work for this tutorial.

```
import os
# Select JAX as the backend
os.environ["KERAS_BACKEND"] = "jax"
# Pre-allocate 100% of TPU memory to minimize memory fragmentation
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.0"
```

### Import packages

Import Keras and KerasNLP.

```
import keras
import keras_nlp
# for reproducibility
keras.utils.set_random_seed(42)
```

### Instantiate the model

KerasNLP provides implementations of many popular model architectures. In this tutorial, you'll instantiate the model using `GemmaCausalLM`

, an end-to-end Gemma model for causal language modeling. A causal language model predicts the next token based on previous tokens.

Instantiate the model using the `from_preset`

method:

```
gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_1.1_instruct_2b_en")
```

Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'task.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'config.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'metadata.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'preprocessor.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook... Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_1.1_instruct_2b_en/3' to your Colab notebook...

The `GemmaCausalLM.from_preset()`

function instantiates the model from a preset architecture and weights. In the code above, the string `"gemma_1.1_instruct_2b_en"`

specifies the preset the Gemma 2B model with 2 billion parameters. Gemma models with 7B, 9B, and 27B parameters are also available. You can find the code strings for Gemma models in their **Model Variation** listings on kaggle.com.

This model variant is fine-tuned for conversations and to answer questions in a more natural manner.

Use the `summary`

method to get more info about the model:

```
gemma_lm.summary()
```

As you can see from the summary, the model has 2.5 billion trainable parameters.

### Define formatting helper functions

```
from IPython.display import Markdown
import textwrap
def display_chat(prompt, text):
formatted_prompt = "<font size='+1' color='brown'>🙋♂️<blockquote>" + prompt + "</blockquote></font>"
text = text.replace('•', ' *')
text = textwrap.indent(text, '> ', predicate=lambda _: True)
formatted_text = "<font size='+1' color='teal'>🤖\n\n" + text + "\n</font>"
return Markdown(formatted_prompt+formatted_text)
def to_markdown(text):
text = text.replace('•', ' *')
return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
```

## Building the chatbot

The Gemma instruction-tuned model `gemma_1.1_instruct_2b_en`

is fine-tuned to understand the following turn tokens:

```
<start_of_turn>user\n ... <end_of_turn>\n
<start_of_turn>model\n ... <end_of_turn>\n
```

This tutorial uses these tokens to build the chatbot. Refer to Formatting and system instructions for more information on Gemma control tokens.

### Create a chat helper to manage the conversation state

```
class ChatState():
"""
Manages the conversation history for a turn-based chatbot
Follows the turn-based conversation guidelines for the Gemma family of models
documented at https://ai.google.dev/gemma/docs/formatting
"""
__START_TURN_USER__ = "<start_of_turn>user\n"
__START_TURN_MODEL__ = "<start_of_turn>model\n"
__END_TURN__ = "<end_of_turn>\n"
def __init__(self, model, system=""):
"""
Initializes the chat state.
Args:
model: The language model to use for generating responses.
system: (Optional) System instructions or bot description.
"""
self.model = model
self.system = system
self.history = []
def add_to_history_as_user(self, message):
"""
Adds a user message to the history with start/end turn markers.
"""
self.history.append(self.__START_TURN_USER__ + message + self.__END_TURN__)
def add_to_history_as_model(self, message):
"""
Adds a model response to the history with start/end turn markers.
"""
self.history.append(self.__START_TURN_MODEL__ + message + self.__END_TURN__)
def get_history(self):
"""
Returns the entire chat history as a single string.
"""
return "".join([*self.history])
def get_full_prompt(self):
"""
Builds the prompt for the language model, including history and system description.
"""
prompt = self.get_history() + self.__START_TURN_MODEL__
if len(self.system)>0:
prompt = self.system + "\n" + prompt
return prompt
def send_message(self, message):
"""
Handles sending a user message and getting a model response.
Args:
message: The user's message.
Returns:
The model's response.
"""
self.add_to_history_as_user(message)
prompt = self.get_full_prompt()
response = self.model.generate(prompt, max_length=1024)
result = response.replace(prompt, "") # Extract only the new response
self.add_to_history_as_model(result)
return result
```

### Chat with the model

Start chatting with the model.

```
chat = ChatState(gemma_lm)
message = "Tell me, in a few words, how to compute all prime numbers up to 1000?"
display_chat(message, chat.send_message(message))
```

🙋♂️

Tell me, in a few words, how to compute all prime numbers up to 1000?🤖

The Sieve of Eratosthenes is a widely used method to compute all prime numbers up to a given limit. It involves iteratively marking out multiples of each prime number.

Continue the conversation.

```
message = "Now in Python! No numpy, please!"
display_chat(message, chat.send_message(message))
```

🙋♂️

Now in Python! No numpy, please!🤖

`def prime(n): if n <= 1: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True`

```
message = "Thank you, it works! Can you explain the code in plain English?"
display_chat(message, chat.send_message(message))
```

🙋♂️

Thank you, it works! Can you explain the code in plain English?🤖

The provided code defines a function

`prime`

that checks whether a given number is prime or not.

How it works:

The function takes a single argument,

`n`

, which is the number to check.It first checks if

`n`

is less than or equal to 1. If it is, the number is not prime, so the function returns`False`

.It then enters a loop that iterates through numbers from 2 to the square root of

`n`

.For each number

`i`

, it checks if`n`

is divisible evenly by`i`

(i.e.,`n % i == 0`

).If

`n`

is divisible by`i`

, the function returns`False`

because`n`

cannot be prime if it has a divisor.If the loop completes without finding any divisors for

`n`

, the function returns`True`

, indicating that`n`

is a prime number.

Example Usage:`>>> prime(2) True >>> prime(3) True >>> prime(4) False >>> prime(5) True`

Benefits of this Code:

- It is a simple and efficient algorithm for finding prime numbers.
- It is widely used in various computer science and mathematical applications.

```
message = "Great! Now add those explanations as comments in the code."
display_chat(message, chat.send_message(message))
```

🙋♂️

Great! Now add those explanations as comments in the code.🤖

`def prime(n): """ Checks whether a given number is prime or not. Args: n: The number to check. Returns: True if n is prime, False otherwise. """ # Check if n is less than or equal to 1. if n <= 1: return False # Iterate through numbers from 2 to the square root of n. for i in range(2, int(n**0.5) + 1): # Check if n is divisible by i. if n % i == 0: return False # If the loop completes without finding any divisors for n, then n is prime. return True`

Test the generated response by running the generated code:

```
def is_prime(n):
"""
Checks if a number is prime.
Args:
n: The number to check.
Returns:
True if n is prime, False otherwise.
"""
# If n is less than or equal to 1, it is not prime.
if n <= 1:
return False
# Iterate through all the numbers from 2 to the square root of n.
for i in range(2, int(n**0.5) + 1):
# If n is divisible by any of the numbers in the range from 2 to the square root of n, it is not prime.
if n % i == 0:
return False
# If no divisors are found, n is prime.
return True
# Initialize an empty list to store prime numbers.
primes = []
# Iterate through all the numbers from 2 to 1000.
for i in range(2, 1001):
# If the number is prime, add it to the list.
if is_prime(i):
primes.append(i)
# Print the prime numbers.
print(primes)
```

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

Use the `get_history`

method to see how all the context was retained by the `Chat`

class.

```
print(chat.get_history())
```

<start_of_turn>user Tell me, in a few words, how to compute all prime numbers up to 1000?<end_of_turn> <start_of_turn>model The Sieve of Eratosthenes is a widely used method to compute all prime numbers up to a given limit. It involves iteratively marking out multiples of each prime number.<end_of_turn> <start_of_turn>user Now in Python! No numpy, please!<end_of_turn> <start_of_turn>model ```python def prime(n): if n <= 1: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True ```<end_of_turn> <start_of_turn>user Thank you, it works! Can you explain the code in plain English?<end_of_turn> <start_of_turn>model The provided code defines a function `prime` that checks whether a given number is prime or not. **How it works:** - The function takes a single argument, `n`, which is the number to check. - It first checks if `n` is less than or equal to 1. If it is, the number is not prime, so the function returns `False`. - It then enters a loop that iterates through numbers from 2 to the square root of `n`. - For each number `i`, it checks if `n` is divisible evenly by `i` (i.e., `n % i == 0`). - If `n` is divisible by `i`, the function returns `False` because `n` cannot be prime if it has a divisor. - If the loop completes without finding any divisors for `n`, the function returns `True`, indicating that `n` is a prime number. **Example Usage:** ```python >>> prime(2) True >>> prime(3) True >>> prime(4) False >>> prime(5) True ``` **Benefits of this Code:** - It is a simple and efficient algorithm for finding prime numbers. - It is widely used in various computer science and mathematical applications.<end_of_turn> <start_of_turn>user Great! Now add those explanations as comments in the code.<end_of_turn> <start_of_turn>model ```python def prime(n): """ Checks whether a given number is prime or not. Args: n: The number to check. Returns: True if n is prime, False otherwise. """ # Check if n is less than or equal to 1. if n <= 1: return False # Iterate through numbers from 2 to the square root of n. for i in range(2, int(n**0.5) + 1): # Check if n is divisible by i. if n % i == 0: return False # If the loop completes without finding any divisors for n, then n is prime. return True ```<end_of_turn>

## Summary and further reading

In this tutorial, you learned how to chat with the Gemma 2B Instruction tuned model using Keras on JAX.

Check out these guides and tutorials to learn more about Gemma:

- Get started with Keras Gemma.
- Finetune the Gemma model on GPU.
- Learn about Gemma integration with Vertex AI
- Learn how to use Gemma models with Vertex AI.