RecurrentGemma is an open model based on Griffin, a hybrid model architecture that mixes gated linear recurrences with local sliding window attention.

Like Gemma, RecurrentGemma is well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. However, RecurrentGemma's unique architecture offers the following additional advantages:

  • Lower memory requirements allow for the generation of longer samples on devices with limited memory, like single GPUs or CPUs.
  • RecurrentGemma can perform inference at significantly higher batch sizes, meaning it can generate substantially more tokens per second — especially when generating long sequences.
  • RecurrentGemma matches Gemma's performance while requiring less memory and achieving faster inference.

More resources

RecurrentGemma's model card contains detailed information about the model, implementation information, evaluation information, model usage and limitations, and more.
View more code, Colab notebooks, information, and discussions about RecurrentGemma on Kaggle.
Run example Colab notebooks for JAX and PyTorch on GitHub.