內容快取

在一般 AI 工作流程中，您可以將相同的輸入符記傳遞至模型。您可以使用 Gemini API 內容快取功能，將部分內容傳送至模型一次、快取輸入權杖，然後參照快取權杖用於後續要求。在某些情況下，使用快取符記比重複傳入相同的符記集合更省錢。

快取一組符記時，您可以選擇快取要保留多久，再自動刪除符記。這個快取時間長度稱為「存留時間」 (TTL)。如未設定，TTL 會預設為 1 小時。快取的成本取決於輸入符記的大小，以及您希望符記保留多久。

內容快取功能同時支援 Gemini 1.5 Pro 和 Gemini 1.5 Flash。

使用內容快取的時機

內容快取特別適用於較短的要求會重複參照大量初始內容的情況。建議針對用途使用內容快取，例如：

具備豐富系統操作說明的聊天機器人
重複分析長篇影片檔案
針對大型文件集執行週期性查詢
頻繁分析程式碼存放區或修正錯誤

快取功能如何降低成本

內容快取是一項付費功能，可降低整體作業成本。費用則按照下列因素計算：

快取符號數量：快取的輸入符號數量，如果納入後續提示，則以較低的費率計費。
儲存期限：快取權杖的儲存時間 (TTL)，根據快取權杖數量的 TTL 期限收費。存留時間沒有下限或上限。
其他因素：其他費用也會產生，例如未快取的輸入符記和輸出符記。

如要查看最新定價詳細資料，請參閱 Gemini API 定價頁面。如要瞭解如何計算符記，請參閱權杖指南。

如何使用脈絡快取

本節假設您已安裝 Gemini SDK (或已安裝 curl)，且您已設定 API 金鑰，如快速入門導覽課程所示。

使用快取產生內容

以下範例說明如何使用快取的系統指令和影片檔案產生內容。

import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time

# Get your API key from https://aistudio.google.com/app/apikey
# and access your API key as an environment variable.
# To authenticate from a Colab, see
# https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb
genai.configure(api_key=os.environ['API_KEY'])

# Download video file
# curl -O https://storage.googleapis.com/generativeai-downloads/data/Sherlock_Jr_FullMovie.mp4

path_to_video_file = 'Sherlock_Jr_FullMovie.mp4'

# Upload the video using the Files API
video_file = genai.upload_file(path=path_to_video_file)

# Wait for the file to finish processing
while video_file.state.name == 'PROCESSING':
  print('Waiting for video to be processed.')
  time.sleep(2)
  video_file = genai.get_file(video_file.name)

print(f'Video processing complete: {video_file.uri}')

# Create a cache with a 5 minute TTL
cache = caching.CachedContent.create(
    model='models/gemini-1.5-flash-001',
    display_name='sherlock jr movie', # used to identify the cache
    system_instruction=(
        'You are an expert video analyzer, and your job is to answer '
        'the user\'s query based on the video file you have access to.'
    ),
    contents=[video_file],
    ttl=datetime.timedelta(minutes=5),
)

# Construct a GenerativeModel which uses the created cache.
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

# Query the model
response = model.generate_content([(
    'Introduce different characters in the movie by describing '
    'their personality, looks, and names. Also list the timestamps '
    'they were introduced for the first time.')])

print(response.usage_metadata)

# The output should look something like this:
#
# prompt_token_count: 696219
# cached_content_token_count: 696190
# candidates_token_count: 214
# total_token_count: 696433

print(response.text)

列出快取

您無法擷取或查看快取內容，但可以擷取快取中繼資料 (name、model、display_name、usage_metadata、create_time、update_time 和 expire_time)。

如要列出所有已上傳快取的中繼資料，請使用 CachedContent.list()：

for c in caching.CachedContent.list():
  print(c)

更新快取

您可以為快取設定新的 ttl 或 expire_time。不支援變更與快取相關的任何其他內容。

以下範例說明如何使用 CachedContent.update() 更新快取的 ttl。

import datetime

cache.update(ttl=datetime.timedelta(hours=2))

刪除快取

快取服務提供刪除作業，可讓您手動從快取中移除內容。以下範例說明如何使用 CachedContent.delete() 刪除快取。

cache.delete()

其他注意事項

使用內容快取時，請注意下列事項：

內容快取的「最小值」輸入符記數量為 32,768 個，最大值則與該模型的上限相同。(如要進一步瞭解如何計算符記，請參閱符記指南)。
模型不會在快取符記和一般輸入符記之間做出任何區別。快取內容就是提示的前置字串。
內容快取沒有特殊的頻率或用量限制，GenerateContent 的標準頻率限制適用，且權杖限制會包含快取權杖。
快取權杖數量是由快取服務的建立、取得和列出作業在 usage_metadata 中傳回，使用快取時也會在 GenerateContent 中傳回。