Using LIT with Gemma

View on Generative AI Run in Google Colab View source on GitHub Learn in Codelabs

Generative AI products are relatively new and their behaviors can vary more than earlier forms of software. This makes it important to probe the machine learning models being used, examine examples of the model's behavior and investigate surprises.

The Learning Interpretability Tool (LIT; website, GitHub) is a platform for debugging and analyzing ML models to understand why and how they behave the way they do.

Here, you'll learn how to setup LIT to get more out of Google's Gemma model by using the Sequence Salience module to analyze different prompt engineering approaches.

Setting up LIT to Debug Gemma Prompts

ERROR: pip's dependency resolver does not currently take into account all the 
packages that are installed. This behaviour is the source of the following 
dependency conflicts.
bigframes 0.21.0 requires scikit-learn>=1.2.2, but you have scikit-learn 1.0.2 
which is incompatible.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.14.0 
which is incompatible.

These are safe to ignore.

Install LIT and Keras NLP

This notebook uses the KerasNLP implementation of Gemma (more on how to configure this below). You will need a recent version of keras (3.0+) keras-nlp (0.12+) and lit-nlp (1.2+), and a Kaggle account to download the base model.

# Keras is included in Colab runtimes, but needs to be updated to to v3.0+.
# LIT and Keras NLP are not icnldued by default and must be installed.
# Running this cell may require you to restart your session to ensure the newer
# packages are imported correctly.
 pip install -q -U "keras >= 3.0, <4.0" "keras-nlp >= 0.14" "lit-nlp >= 1.2"

Kaggle Access

KerasNLP stores their pre-trained model weights on Kaggle. The kagglehub package is used to autheticate with this service. Be sure to also accept the license agreement for Gemma from your Kaggle account.

See the Appendix at the end for more information on how to set up a Kaggle account.

import kagglehub

kagglehub.login()

Configuring LIT

LIT provides a function, make_notebook_widget() for configuring our prompt debugging tools in a notebook context.

LIT provides a dataset of sample prompts that accompany the tutorial linked later in this document.

See the comments below for configuring the widget to use different models and/or datasets.

from lit_nlp.examples.prompt_debugging import notebook as lit_pdbnb

# The following function initializes a LIT Notebook Widget. It's configured by
# two required positional arguments:
#
# * `datasets_config`: A list of strings containing the dataset names and
#       paths to load from, as "dataset:path", where path can be a URL or a
#       local file path. The example below uses a special value,
#       `sample_prompts`, to load the example prompts provided in the LIT
#       distribution; no other special values are supported.
# * `models_config`: A list of strings containing the model names and paths to
#       load from, as "model:path", where path can be a URL, a local file path,
#       or the name of a preset for the configured deep learning framework.
#
# LIT supports salience computation for KerasNLP and Hugging Face Transformers
# models running on TensorFlow or PyTorch. Note that all models passed to the
# `models_config` parameter will be loaded using the same framework and runtime.
# You can cofnigre these with the following keywork arguments.
#
# * `dl_framework`: Must be one of "kerasnlp" or "transformers".
# * `dl_runtime`: Must be one of "tensorflow" or "torch".
#
# Changing the `dl_framework` value will affect the authentication method used
# to access Gemma model weights.

lit_widget = lit_pdbnb.make_notebook_widget(
    ['sample_prompts'],
    ["gemma_2b_it:gemma_1.1_instruct_2b_en"],
    dl_framework="kerasnlp",
    dl_runtime="tensorflow",
    batch_size=1,
    max_examples=5,
    precision="bfloat16",
)

Now you can render the UI in a Colab cell.

lit_widget.render()
<IPython.core.display.Javascript object>

Prompt Debugging with Sequence Salience

Text-to-text large language models (LLMs), such as Gemma, take an input sequence in the form of tokenized text and generate new tokens that are logical follow-ons or completions.

Salience methods allow you to inspect which parts of an input are important to the model for different parts of its generated output. LIT's Sequence Salience module extends these methods to explain the importance of sequences at multiple levels of granularity: from tokens to words to sentences and beyond.

You can use LIT in the cell above to play around with the Sequence Salience module on your own. For a more guided learning experience, you can follow long with the Prompt Debugging with Sequence Salience tutorial right in this Colab.

For even more academic and techncial information on how Sequence Salience works, check out our paper.

Appendix: Accessing Gemma on Kaggle Hub

This notebook uses the KerasNLP implementation of Gemma in this document. KerasNLP stores their pre-trained model weights on Kaggle, and Gemma requires authentication and license acknowledgement to access those weights.

The following instruction walk you through how to set up a Kaggle account and authenticate with Kaggle using the kagglehub package.

  1. Create a Kaggle account if you don't have one
  2. Request access to Gemma
    • Make sure you're logged into Kaggle using the account above
    • Go to the consent page: https://www.kaggle.com/models/google/gemma/license/consent
    • Select the "Verify via Kaggle Account" option (the default selection) and click next
    • Complete the consent form (first name and last name fields at the top)
    • Acknowledge the policy using the checkboxes at the bottom
    • Click the "Accept" button at the bottom to be granted access
    • This should redirect you to the model page (https://www.kaggle.com/models/google/gemma)
  3. Create an API token
    • Make sure you're logged into Kaggle using the account you created above
    • Got to the Settings page: https://www.kaggle.com/settings
    • Scroll down to the API section
    • Use the "Create New Token" button to trigger token generation
    • Use the on-screen menu to save the JSON file, named kaggle.json, the service generates to your machine
    • The JSON file is an object with two properties, username and key, you'll need both to authenticate with their service later
  4. Use your API token credentials to authenticate with kagglehub in Colab
    • Go to the LIT Sequence Saleince Colab: https://colab.sandbox.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb#scrollTo=yKw8gDsh_nVR
    • Conntect to a GPU runtime
    • For Gemma 2B you can use the free-tier T4 runtime
    • For Gemma 7B you will need pre-paid Colab compute credits or a Colab Pro account to use a V100, L4, or A100 GPU
    • Run the kagglehub code cell to display an HTML form that asks for your username and a token
    • Copy the username field from the kaggle.json file you downloaded in the previous step and paste it into the username field in the form
    • Copy the key field from the kaggle.json file you downloaded in the previous step and paste it into the token field in the form
    • Click the login button to save these credentials in your runtime

You will need to repeat the last step any time the Colab runtime is disconnected, as disconnection clears the cache the credentials are stored in.