This guide describes the adjustable safety settings available for the Gemini API. During the prototyping stage, you can adjust safety settings on 4 dimensions to quickly assess if your application requires more or less restrictive configuration. By default, safety settings block content (including prompts) with medium or higher probability of being unsafe across any dimension. This baseline safety is designed to work for most use cases, so you should only adjust your safety settings if it's consistently required for your application.
In addition to the adjustable safety filters, the Gemini API has built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.
The adjustable safety filters cover the following categories:
- Hate speech
- Sexually explicit
These settings allow you, the developer, to determine what is appropriate for your use case. For example, if you're building a video game dialogue, you may deem it acceptable to allow more content that's rated as dangerous due to the nature of the game. Here are a few other example use cases that may need some flexibility in these safety settings:
|Anti-Harassment Training App
|Hate speech, Sexually explicit
|Sexually explicit, Dangerous
Probability versus severity
The Gemini API blocks content based on the probability of content being unsafe and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:
- The robot punched me.
- The robot slashed me up.
Sentence 1 might result in a higher probability of being unsafe but you might consider sentence 2 to be a higher severity in terms of violence.
Given this, it is important for each developer to carefully test and consider what the appropriate level of blocking is needed to support their key use cases while minimizing harm to end users.
Safety settings are part of the request you send to the text service. It can be adjusted for each request you make to the API. The following table lists the categories that you can set and describes the type of harm that each category encompasses.
|Negative or harmful comments targeting identity and/or protected attributes.
|Content that is rude, disrespectful, or profane.
|Contains references to sexual acts or other lewd content.
|Promotes, facilitates, or encourages harmful acts.
These definitions are in the API reference as
well. The Gemini models only support
HARM_CATEGORY_DANGEROUS_CONTENT. The other categories are used by PaLM 2
The following table describes the block settings you can adjust for each category. For example, if you set the block setting to Block few for the Hate speech category, everything that has a high probability of being hate speech content is blocked. But anything with a lower probability is allowed.
If not set, the default block setting is Block some for all categories.
|Threshold (Google AI Studio)
|Always show regardless of probability of unsafe content
|Block when high probability of unsafe content
|Block when medium or high probability of unsafe content
|Block when low, medium or high probability of unsafe content
|Threshold is unspecified, block using default threshold
You can set these settings for each request you make to the text service. See
API reference for details.
Response candidate feedback is included in
response content was blocked and the
SAFETY, you can
safetyRatings for more details. The safety rating includes the
category and the probability of the harm classification. The content that was
blocked is not returned.
The probability returned correspond to the block confidence levels as shown in the following table:
|Content has a negligible probability of being unsafe
|Content has a low probability of being unsafe
|Content has a medium probability of being unsafe
|Content has a high probability of being unsafe
For example, if the content was blocked due to the harassment category having a
high probability, the safety rating returned would have category equal to
HARASSMENT and harm probability set to
Safety settings in Google AI Studio
You can also adjust safety settings in Google AI Studio, but you cannot turn them off. To do so, in the Run settings, click Edit safety settings:
And use the knobs to adjust each setting:
ANo Content message appears if the content is blocked. To see more details, hold the pointer over No Content and click Safety.
This section shows how to use the safety settings in code using the python client library.
The following is a python code snippet showing how to set safety settings in
GenerateContent call. This sets the harm categories
Hate speech to
BLOCK_LOW_AND_ABOVE which blocks any content that has a low
or higher probability of being harassment or hate speech.
from google.generativeai.types import HarmCategory, HarmBlockThreshold
model = genai.GenerativeModel(model_name='gemini-pro-vision')
response = model.generate_content(
['Do these look store-bought or homemade?', img],
The following shows a code snippet for parsing the safety feedback from the response.
# If the response doesn't contain text, check if the prompt was blocked.
# Also check the finish reason to see if the response was blocked.
# If the finish reason was SAFETY, the safety ratings have more details.
- See the API reference to learn more about the full API.
- Review the safety guidance for a general look at safety considerations when developing with LLMs.
- Learn more about assessing probability versus severity from the Jigsaw team
- Learn more about the products that contribute to safety solutions like the Perspective API.
- You can use these safety settings to create a toxicity classifier. See the classification example to get started.