Billing

This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing.

About billing

Billing for the Gemini API is based on two pricing tiers: free of charge (or free) and pay-as-you-go (or paid). Pricing and rate limits differ between these tiers and also vary by model. For current prices and rate limits, see Pricing. For a model-by-model breakdown of capabilities, see the Gemini models page.

Rate limits

Rate limits are quotas that limit the number of requests or tokens that you can send to the Gemini API in a given time window. Rate limits can apply per request or per token. Here's a fictitious example of quotas that might apply to a given model in a given tier:

  • 10 RPM (requests per minute)
  • 10,000 TPM (tokens per minute)
  • 1,000 RPD (requests per day)

In this example, if you exceed 10 RPM (or 10,000 TPM or 1,000 RPD), the Gemini API service throws a 429: RESOURCE_EXHAUSTED error indicating that you've exceeded the rate limit.

When you enable billing and use the paid tier, you benefit from higher rate limits, and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service.

Cloud Billing

The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support.

Enable billing

You can enable Cloud Billing starting from Google AI Studio:

  1. Open Google AI Studio.

  2. In the bottom of the left sidebar, select Settings > Plan information.

  3. Click Set up Billing for your chosen project to enable Cloud Billing.

Monitor usage

After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console. The service name for the API is generativelanguage.googleapis.com, and in the console the Gemini API is also referred to as the Generative Language API.

Cloud console

To learn more, see the Google Cloud documentation on monitoring API usage.

Frequently asked questions

This section provides answers to frequently asked questions.

What am I billed for?

Gemini API pricing is based on the following:

  • Input token count
  • Output token count
  • Cached token count
  • Cached token storage duration

For pricing information, see the pricing page.

Where can I view my quota?

You can view your quota and system limits in the Google Cloud console.

Can I use the Gemini API for free in EEA (including EU), the UK, and CH?

There are two sets of models available through the API:

  1. Paid models which include Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini 1.0 Pro. These models will not have a free tier available in the EEA (including EU), the UK and CH. Developers will have to go through the necessary steps to set up a billing account and pay for usage.
  2. Certain models may be accessible for free in the Gemini API. (See ai.google.dev/pricing for details on which models we charge for; the other models are free). However, you will still need to set up a billing account to use these models.

If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage?

No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions including EEA, EU, the UK, and CH.

Can I use 1M tokens in the free tier?

The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways:

  • In Google AI Studio
  • With pay-as-you-go plans
  • With free-of-charge plans for select models

See the latest free-of-charge rate limits per model on the pricing page.

How can I calculate the number of tokens I'm using?

Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens.

How is billing handled?

Billing for the Gemini API is handled by the Cloud Billing system.

Am I charged for failed requests?

If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota.

Is there a charge for fine-tuning the models?

Model tuning is free, but inference on tuned models is charged at the same rate as the base models.

Is GetTokens billed?

Requests to the GetTokens API are not billed, and they don't count against inference quota.

Where can I get help with billing?

To get help with billing, see Get Cloud Billing support.