Responsible Generative AI Toolkit

Tools and guidance to design, build and evaluate open AI models responsibly.

Responsible application design

Define rules for model behaviour, create a safe and accountable application, and maintain transparent communication with users.

Safety alignment

Discover prompt-debugging techniques and guidance for fine-tuning and RLHF to align AI models with safety policies.

Model evaluation

Find guidance and data to conduct a robust model evaluation for safety, fairness, and factuality with the LLM Comparator.

Safeguards

Deploy safety classifiers, using off-the-shelf solutions or build your own with step-by-step tutorials.

Design a responsible approach

Proactively identify potential risks of your application and define a system-level approach to build safe and responsible applications for users.

Get started

Define system-level policies

Determine what type of content your application should and should not generate.

Design for safety

Define your overall approach to implement risk mitigation techniques, considering technical and business tradeoffs.

Learn more

Be transparent

Communicate your approach with artifacts like model cards.

See Templates

Secure AI systems

Consider AI-specific security risks and remediation methods highlighted in the Secure AI Framework (SAIF).

Align your model

Align your model with your specific safety policies using prompting and tuning techniques.

Get started

Craft safer, more robust prompts

Use the power of LLMs to help craft safer prompt templates with the Model Alignment library.

Tune models for safety

Control model behavior by tuning your model to align with your safety and content policies.

Investigate model prompts

Build safe and helpful prompts through iterative improvement with the Learning Interpretability Tool (LIT).

Evaluate your model

Evaluate model risks on safety, fairness, and factual accuracy using our guidance and tooling.

Get started

LLM Comparator

Conduct side-by-side evaluations with LLM Comparator to qualitatively assess differences in responses between models, different prompts for the same model, or even different tunings of a model

Model evaluation guidelines

Learn about red teaming best practices and evaluate your model against academic benchmarks to assess harms around safety, fairness, and factuality.

Protect with safeguards

Filter your application's input and outputs, and protect users from undesirable outcomes.

Get started

SynthID Text

A tool for watermarking and detecting text generated by your model.

SynthID text watermarking

ShieldGemma

A series of content safety classifiers, built on Gemma 2, available in three sizes: 2B, 9B, 27B.

ShieldGemma content safety classifiers

Agile classifiers

Create safety classifiers for your specific policies using parameter efficient tuning (PET) with relatively little training data

Create safety classifiers

Checks AI Safety

Ensure AI safety compliance against your content policies with APIs and monitoring dashboards.

Checks AI Safety

Text moderation service

Detect a list of safety attributes, including various potentially harmful categories and topics that may be considered sensitive with this Google Cloud Natural Language API available for free below a certain usage limit.

Perspective API

Identify "toxic" comments with this free Google Jigsaw API to mitigate online toxicity and ensure healthy dialogue.

Perspective API