Responsible Generative AI Toolkit
Tools and guidance to design, build and evaluate open AI models responsibly.
Responsible application design
Define rules for model behaviour, create a safe and accountable application, and maintain transparent communication with users.
Safety alignment
Discover prompt-debugging techniques and guidance for fine-tuning and RLHF to align AI models with safety policies.
Model evaluation
Find guidance and data to conduct a robust model evaluation for safety, fairness, and factuality with the LLM Comparator.
Safeguards
Deploy safety classifiers, using off-the-shelf solutions or build your own with step-by-step tutorials.
Design a responsible approach
Proactively identify potential risks of your application and define a system-level approach to build safe and responsible applications for users.
Get started
Define system-level policies
Determine what type of content your application should and should not generate.
Design for safety
Define your overall approach to implement risk mitigation techniques, considering technical and business tradeoffs.
Be transparent
Communicate your approach with artifacts like model cards.
Secure AI systems
Consider AI-specific security risks and remediation methods highlighted in the Secure AI Framework (SAIF).
Align your model
Align your model with your specific safety policies using prompting and tuning techniques.
Get started
Investigate model prompts
Build safe and helpful prompts through iterative improvement with the Learning Interpretability Tool (LIT).
Tune models for safety
Control model behavior by tuning your model to align with your safety and content policies.
Evaluate your model
Evaluate model risks on safety, fairness, and factual accuracy using our guidance and tooling.
Get started
LLM Comparator
Conduct side-by-side evaluations with LLM Comparator to qualitatively assess differences in responses between models, different prompts for the same model, or even different tunings of a model
Model evaluation guidelines
Learn about red teaming best practices and evaluate your model against academic benchmarks to assess harms around safety, fairness, and factuality.
Protect with safeguards
Filter your application's input and outputs, and protect users from undesirable outcomes.
Get started
ShieldGemma
A series of content safety classifiers, built on Gemma 2, available in three sizes: 2B, 9B, 27B.
Agile classifiers
Create safety classifiers for your specific policies using parameter efficient tuning (PET) with relatively little training data
Text moderation service
Detect a list of safety attributes, including various potentially harmful categories and topics that may be considered sensitive with this Google Cloud Natural Language API available for free below a certain usage limit.
Perspective API
Identify "toxic" comments with this free Google Jigsaw API to mitigate online toxicity and ensure healthy dialogue.