Agile Classifiers: Customized content policy classifiers

Agile classifiers is an efficient and flexible method for creating custom content policy classifiers by tuning models, such as Gemma, to fit your needs. They also allow you complete control over where and how they are deployed.

Gemma Agile Classifier Tutorials

Start Codelab

Start Google Colab

The codelab and tutorial use LoRA to fine-tune a Gemma model to act as a content policy classifier using the KerasNLP library. Using only 200 examples from the ETHOS dataset, this classifier achieves an F1 score of 0.80 and ROC-AUC score of 0.78, which compares favorably to state of the art leaderboard results. When trained on the 800 examples, like the other classifiers on the leaderboard, the Gemma-based agile classifier achieves an F1 score of 83.74 and a ROC-AUC score of 88.17. You can adapt the tutorial instructions to further refine this classifier, or to create your own custom safety classifier safeguards.