이제 Interactions API가 정식 버전으로 출시되었습니다. 이 API를 사용하여 모든 최신 기능과 모델에 액세스하는 것이 좋습니다.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite는 고빈도 경량 작업에 최적화된 짧은 지연 시간의 비용 효율적인 멀티모달 모델입니다. 이 모델은 텍스트, 이미지, 동영상, 오디오, PDF 입력을 지원하며 대용량 에이전트 워크플로, 간단한 데이터 추출, 지연 시간과 API 비용이 주요 제약 조건인 애플리케이션을 위해 설계되었습니다.

Google AI Studio에서 사용해 보기

gemini-3.1-flash-lite

속성	설명
모델 코드	`gemini-3.1-flash-lite`
지원되는 데이터 유형	입력 텍스트, 이미지, 동영상, 오디오, PDF 출력 텍스트
토큰 한도^[*]	입력 토큰 한도 1,048,576 출력 토큰 한도 65,536
기능	오디오 생성 지원되지 않음 캐싱 지원됨 코드 실행 지원됨 컴퓨터 사용 지원되지 않음 파일 검색 지원됨 함수 호출 지원됨 Google 지도 기반 그라운딩 지원됨 이미지 생성 지원되지 않음 Live API 지원되지 않음 검색 그라운딩 지원됨 구조화된 출력 지원됨 사고 지원됨 URL 컨텍스트 지원됨
소비 옵션	Batch API 지원됨 가변 추론 지원됨 우선순위 추론 지원됨
버전	자세한 내용은 모델 버전 패턴을 읽어보세요. `Stable: gemini-3.1-flash-lite`
최신 업데이트	2026년 5월
지식 단절	2025년 1월

개발자 가이드

Gemini 3.1 Flash-Lite는 상당한 규모의 간단한 작업을 처리하는 데 가장 적합합니다. 다음은 Gemini 3.1 Flash-Lite에 가장 적합한 사용 사례입니다.

번역: 채팅 메시지, 리뷰, 지원 티켓을 대규모로 처리하는 등 빠르고 저렴한 대용량 번역 시스템 안내를 사용하여 추가 설명 없이 번역된 텍스트로만 출력을 제한할 수 있습니다.

from google import genai

client = genai.Client()
text = "Hey, are you down to grab some pizza later? I'm starving!"

response = client.models.generate_content(
    model="gemini-3.1-flash-lite",
    config={
        "system_instruction": "Only output the translated text"
    },
    contents=f"Translate the following text to German: {text}"
)

print(response.text)

스크립트 작성: 별도의 음성 텍스트 변환 파이프라인을 실행하지 않고 텍스트 스크립트가 필요한 녹음 파일, 음성 메모 또는 오디오 콘텐츠를 처리합니다. 멀티모달 입력을 지원하므로 스크립트 작성을 위해 오디오 파일을 직접 전달할 수 있습니다.

from google import genai

client = genai.Client()

# URL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3"
# Upload the audio file to the GenAI File API
uploaded_file = client.files.upload(file='sample.mp3')

prompt = 'Generate a transcript of the audio.'

response = client.models.generate_content(
  model="gemini-3.1-flash-lite",
  contents=[prompt, uploaded_file]
)

print(response.text)

경량 에이전트 작업 및 데이터 추출: 항목 추출, 분류, 구조화된 JSON 출력을 지원하는 경량 데이터 처리 파이프라인 예를 들어 이커머스 고객 리뷰에서 구조화된 데이터를 추출합니다.

from google import genai
from pydantic import BaseModel, Field

client = genai.Client()

prompt = "Analyze the user review and determine the aspect, sentiment score, summary quote, and return risk"
input_text = "The boots look amazing and the leather is high quality, but they run way too small. I'm sending them back."

class ReviewAnalysis(BaseModel):
    aspect: str = Field(description="The feature mentioned (e.g., Price, Comfort, Style, Shipping)")
    summary_quote: str = Field(description="The specific phrase from the review about this aspect")
    sentiment_score: int = Field(description="1 to 5 (1=worst, 5=best)")
    is_return_risk: bool = Field(description="True if the user mentions returning the item")

response = client.models.generate_content(
    model="gemini-3.1-flash-lite",
    contents=[prompt, input_text],
    config={
        "response_mime_type": "application/json",
        "response_json_schema": ReviewAnalysis.model_json_schema(),
    },
)

print(response.text)

문서 처리 및 요약: PDF를 파싱하고 문서 처리 파이프라인을 빌드하거나 수신 파일을 빠르게 분류하는 것과 같이 간결한 요약을 반환합니다.

from google import genai
from google.genai import types
import httpx

client = genai.Client()

# Download a sample PDF document
doc_url = "https://storage.googleapis.com/generativeai-downloads/data/med_gemini.pdf"
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
    model="gemini-3.1-flash-lite",
    contents=[
        types.Part.from_bytes(
            data=doc_data,
            mime_type='application/pdf',
        ),
        prompt
    ]
)

print(response.text)

모델 라우팅: 작업 복잡성을 기반으로 쿼리를 적절한 모델로 라우팅하는 분류기로 짧은 지연 시간과 저렴한 비용의 모델을 사용합니다. 이는 프로덕션의 실제 패턴입니다. 오픈소스 Gemini CLI는 Flash-Lite를 사용하여 작업 복잡성을 분류하고 그에 따라 Flash 또는 Pro로 라우팅합니다.

from google import genai

client = genai.Client()

FLASH_MODEL = 'flash'
PRO_MODEL = 'pro'

CLASSIFIER_SYSTEM_PROMPT = f"""
You are a specialized Task Routing AI. Your sole function is to analyze the user's request and classify its complexity. Choose between `{FLASH_MODEL}` (SIMPLE) or `{PRO_MODEL}` (COMPLEX).
1.  `{FLASH_MODEL}`: A fast, efficient model for simple, well-defined tasks.
2.  `{PRO_MODEL}`: A powerful, advanced model for complex, open-ended, or multi-step tasks.

A task is COMPLEX if it meets ONE OR MORE of the following criteria:
1.  High Operational Complexity (Est. 4+ Steps/Tool Calls)
2.  Strategic Planning and Conceptual Design
3.  High Ambiguity or Large Scope
4.  Deep Debugging and Root Cause Analysis

A task is SIMPLE if it is highly specific, bounded, and has Low Operational Complexity (Est. 1-3 tool calls).
"""

user_input = "I'm getting an error 'Cannot read property 'map' of undefined' when I click the save button. Can you fix it?"

response_schema = {
  "type": "object",
  "properties": {
    "reasoning": {
      "type": "string",
      "description": "A brief, step-by-step explanation for the model choice, referencing the rubric."
    },
    "model_choice": {
      "type": "string",
      "enum": [FLASH_MODEL, PRO_MODEL]
    }
  },
  "required": ["reasoning", "model_choice"]
}

response = client.models.generate_content(
    model="gemini-3.1-flash-lite",
    contents=user_input,
    config={
        "system_instruction": CLASSIFIER_SYSTEM_PROMPT,
        "response_mime_type": "application/json",
        "response_json_schema": response_schema
    },
)

print(response.text)

사고: 단계별 추론의 이점을 누릴 수 있는 작업의 정확성을 높이려면 모델이 최종 출력을 생성하기 전에 내부 추론에 추가 컴퓨팅을 사용하도록 사고를 구성합니다.

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.1-flash-lite",
    contents="How does AI work?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)

print(response.text)