Warning: The PaLM API is decomissioned. The Vertex AI PaLM API is scheduled to be decomissioned in October 2024. Please upgrade to the Gemini API. Learn more about this in the PaLM API deprecation guide.

Safety settings

Overview

This guide describes the PaLM API adjustable safety settings available for the text service. During the prototyping stage, you can adjust safety settings on six dimensions to quickly assess if your application requires more or less restrictive configuration. By default, safety settings block content with medium and/or high probability of being unsafe content across all six dimensions. This baseline safety is designed to work for most use cases, so you should only adjust your safety settings if it's consistently required for your application.

Safety filters

In addition to the adjustable safety filters, the PaLM API has built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.

The adjustable safety filters cover the following categories:

Derogatory
Toxic
Sexual
Violent
Medical
Dangerous

These settings allow you, the developer, to determine what is appropriate for your use case. For example, if you're building a video game dialogue, you may deem it acceptable to allow more content that's rated as violent or dangerous due to the nature of the game. Here are a few other example use cases that may need some flexibility in these safety settings:

Use Case	Category
Anti-Harassment Training App	Derogatory, Sexual, Toxic
Medical Exam Study Pal	Medical
Screenplay Writer	Violent, Sexual, Medical, Dangerous
Toxicity classifier	Toxic, Derogatory

Probability vs severity

The PaLM API blocks content based on the probability of content being unsafe and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:

The robot punched me.
The robot slashed me up.

Sentence 1 might result in a higher probability of being unsafe but you might consider sentence 2 to be a higher severity in terms of violence.

Given this, it is important for each developer to carefully test and consider what the appropriate level of blocking is needed to support their key use cases while minimizing harm to end users.

Safety Settings

Safety settings are part of the request you send to the text service. It can be adjusted for each request you make to the API. The following table lists the categories that you can set and describes the type of harm that each category encompasses.

Categories	Descriptions
Derogatory	Negative or harmful comments targeting identity and/or protected attributes.
Toxic	Content that is rude, disrespectful, or profane.
Sexual	Contains references to sexual acts or other lewd content.
Violent	Describes scenarios depicting violence against an individual or group, or general descriptions of gore.
Dangerous	Promotes, facilitates, or encourages harmful acts.
Medical	Content that is related to medical topics

You can see these definitions in the API reference as well.

The following table describes the block settings you can adjust for each category. For example, if you set the block setting to Block few for the Derogatory category, everything that has a high probability of being derogatory content is blocked. But anything with a lower probability is allowed.

If not set, the default block setting is Block some or Block most depending on the policy category.

Threshold (Google AI Studio)	Threshold (API)	Description
Block none	BLOCK_NONE	Always show regardless of probability of unsafe content
Block few	BLOCK_ONLY_HIGH	Block when high probability of unsafe content
Block some (Default for sexual, violent, dangerous and medical)	BLOCK_MEDIUM_AND_ABOVE	Block when medium or high probability of unsafe content
Block most (Default for Derogatory and toxicity)	BLOCK_LOW_AND_ABOVE	Block when low, medium or high probability of unsafe content
	HARM_BLOCK_THRESHOLD_UNSPECIFIED	Threshold is unspecified, block using default threshold

You can set these settings for each request you make to the text service. See the HarmBlockThreshold API reference for details.

Safety feedback

If content was blocked, the response from the API contains the reason it was blocked in the ContentFilter.reason field. If the reason was related to safety, then the response also contains a SafetyFeedback field which includes the safety settings that were used for that request as well as a safety rating. The safety rating includes the category and the probability of the harm classification. The content that was blocked is not returned.

The probability returned correspond to the block confidence levels as shown in the following table:

Probability	Description
NEGLIGIBLE	Content has a negligible probability of being unsafe
LOW	Content has a low probability of being unsafe
MEDIUM	Content has a medium probability of being unsafe
HIGH	Content has a high probability of being unsafe

For example, if the content was blocked due to the toxicity category having a high probability, the safety rating returned would have category equal to TOXICITY and harm probability set to HIGH.

Safety settings in Google AI Studio

You can set these settings in Google AI Studio as well. In the Run settings, click the Edit safety settings:

Safety settings button

And use the knobs to adjust each setting:

Safety settings button

A No Content message appears if the content is blocked. To see more details, hold the pointer over No Content and click Safety.

Code examples

This section shows how to use the safety settings in code using the python client library.

Request example

The following is a python code snippet showing how to set safety settings in your GenerateText call. This sets the harm categories Derogatory and Violence to BLOCK_LOW_AND_ABOVE which blocks any content that has a low or higher probability of being violent or derogatory.

completion = genai.generate_text(
    model=model,
    prompt=prompt,
    safety_settings=[
        {
            "category": safety_types.HarmCategory.HARM_CATEGORY_DEROGATORY,
            "threshold": safety_types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        },
        {
            "category": safety_types.HarmCategory.HARM_CATEGORY_VIOLENCE,
            "threshold": safety_types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        },
    ]
)

Response example

The following shows a code snippet for parsing the safety feedback from the response. Note that the safety feedback will be empty unless the reason for blocking was one of the safety dimensions.

# First check the content filter reason
for filter in completion.filters:
    print(filter["reason"])

# If any of the reason is "safety", then the safety_feedback field will be
# populated
for feedback in completion.safety_feedback:
    print(feedback["rating"])
    print(feedback["setting"])s

Next steps

See the API reference to learn more about the full API.
Review the safety guidance for a general look at safety considerations when developing with LLMs.
Learn more about assessing probability versus severity from the Jigsaw team
Learn more about the products that contribute to safety solutions like the Perspective API.
You can use these safety settings to create a toxicity classifier. See the classification example to get started.