Agile classifiers is an efficient and flexible method
for creating custom content policy classifiers by tuning models, such as Gemma,
to fit your needs. They also allow you complete control over where and how they
are deployed.
The codelab and
tutorial use LoRA to fine-tune a Gemma
model to act as a content policy classifier using the KerasNLP
library. Using only 200 examples from the ETHOS dataset, this
classifier achieves an F1 score of 0.80 and ROC-AUC score
of 0.78, which compares favorably to state of the art
leaderboard results. When trained on the 800 examples,
like the other classifiers on the leaderboard, the Gemma-based agile classifier
achieves an F1 score of 83.74 and a ROC-AUC score of 88.17. You can adapt the
tutorial instructions to further refine this classifier, or to create your own
custom safety classifier safeguards.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-10-23 UTC."],[],[]]