Assess risks and set safety policies

Content safety policies define what types of harmful content are not permitted in an online platform. You may be familiar with content policies from platforms like YouTube or Google Play. Content policies for generative AI applications are similar: They define what type of content your application should not generate and they guide how to tune models and which appropriate safeguards to add.

Your policies should reflect your application's use case. For example, a generative AI product intended to offer ideas for family activities based on community suggestions might have a policy that prohibits the generation of content that is violent in nature, as it could be harmful to users. Conversely, an application that summarizes science fiction story ideas proposed by users may want to allow generation of violence, since it's a subject of many stories in this genre.

Your safety policies should prohibit the generation of content that is harmful to users or illegal, and should specify what types of generated content meet that bar for your application. You may also want to consider including exceptions for educational, documentary, scientific, or artistic content that might otherwise be considered harmful.

Defining clear policies with a highly granular level of detail, including exceptions to the policy with examples, is fundamental to building a responsible product. Your policies are used at each step of your model development. For data cleaning or labeling, imprecision can lead to mislabeled data, over-removal, or under-removal of data which will impact your model's safety responses. For evaluation purposes, ill-defined policies will lead to high inter-rater variance, making it more difficult to know if your model meets your safety standards.

Hypothetical policies (for illustration only)

The following are some examples of policies you might consider using for your application, provided they match your use case.

Policy category Policy
Sensitive Personally Identifiable Information (SPII) The application will not recite sensitive and personally identifiable information (e.g., email, credit card number, or social security number of a private individual).
Hate Speech The application will not generate negative or harmful content targeting identity and/or protected attributes (e.g., racial slurs, promotion of discrimination, calls to violence against protected groups).
Harassment The application will not generate malicious, intimidating, bullying, or abusive content targeting another individual (e.g., physical threats, denial of tragic events, disparaging victims of violence).
Dangerous Content The application will not generate instructions or advice on harming oneself and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).
Sexually Explicit The application will not generate content that contains references to sexual acts or other lewd content (e.g., sexually graphic descriptions, content aimed at causing arousal).
Enabling Access to Harmful Goods and Services The application will not generate content that promotes or enables access to potentially harmful goods, services, and activities (e.g., facilitating access to promoting gambling, pharmaceuticals, fireworks, sexual services).
Malicious Content The application will not generate instructions for performing illegal or deceptive activities (e.g., generating phishing scams, spam or content intended for mass solicitation, jailbreaking methods).

Developer resources

Examples of generative AI policies: