Interactions API 现已正式发布。我们建议使用此 API 来访问所有最新功能和模型。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

使用 Gemini Omni Flash 生成和编辑视频

Gemini Omni Flash (gemini-omni-flash-preview) 是一款高性能多模态模型，专为高速视频生成、编辑和电影控制而设计。 Gemini Omni 基于以下核心功能构建，这些功能使其与之前的视频模型有所不同：

原生多模态 ：它可以同时处理文本、图片、音频和视频，为您提供更具凝聚力、一致性和可控性的输出。
对话式编辑：借助 Interactions API，您可以通过自然语言对话以迭代方式优化和编辑视频。描述您想要更改的内容，模型会在应用编辑的同时保留您想要保留的视频部分。
世界知识 ：Gemini Omni 将对物理学的理解与 Gemini 的历史、科学和文化背景知识相结合，弥合了从照片写实主义到有意义的故事讲述之间的差距。

文生视频

根据文本提示生成视频。模型会根据您的文本描述生成带音频的视频。撰写提示时，请提供场景描述、镜头移动、光效和氛围等详细信息，以获得最佳效果。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A marble rolling fast on a chain reaction style track, continuous smooth shot."
)
with open("marble.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({  
  model: 'gemini-omni-flash-preview',  
  input: 'A marble rolling fast on a chain reaction style track, continuous smooth shot.',
});

if (interaction.output_video?.data) {
  fs.writeFileSync('marble.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A marble rolling fast on a chain reaction style track, continuous smooth shot."
}'

REST 响应架构

便捷字段 interaction.output_video 仅适用于 SDK 。直接使用 REST API 时，请从 steps 数组获取视频输出。

原始 REST JSON 结构：

{
  "steps": [
    { "type": "user_input", "content": [{"type": "text", "text": "..."}] },
    { "type": "thought", "content": [{"text": "...", "type": "thought"}] },
    {
      "type": "model_output",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "data": "AAAAIGZ0eXBpc29t..." // Base64 encoded video data
        }
      ]
    }
  ],
  "id": "v1_...",
  "status": "completed",
  "model": "gemini-omni-flash-preview",
  "object": "interaction"
}

控制宽高比

将 aspect_ratio 设置为 "9:16" 以创建竖屏视频。默认设置为横向 (16:9)。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A futuristic city with neon lights and flying cars, cyberpunk style",
    response_format={
        "type": "video",  # optional
        "aspect_ratio": "9:16"  # Supported values: "9:16", "16:9"
    }
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A futuristic city with neon lights and flying cars, cyberpunk style',
  response_format: {
    type: 'video', // optional
    aspect_ratio: '9:16' // Supported values: '9:16', '16:9'
  },
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A futuristic city with neon lights and flying cars, cyberpunk style",
 "response_format": {
   "type": "video",
   "aspect_ratio": "9:16"
 }
}'

图生视频

您可以提供参考图片以及文本提示。模型将根据您的提示决定如何使用图片。这对于让产品照片、插图或照片栩栩如生非常有用。

以下示例展示了如何使用鱼从水中跳出的绘画的参考图片：

使用以下提示：

turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video

生成绘画的逼真视频。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": base64_image, "mime_type": "image/jpeg"},
        {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
    ],
)
with open("clownfish.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: base64Image, mime_type: 'image/jpeg' },
    { type: 'text', text: 'turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video' }
  ]
});

if (interaction.output_video?.data) {
  fs.writeFileSync('clownfish.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": [
   {"type": "image", "data": "'"$BASE64_IMAGE"'", "mime_type": "image/jpeg"},
   {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
 ]
}'

正文参考

您可以生成包含作为参考图片提供的特定正文的视频。例如，以下代码展示了如何提供猫和毛线的 2 张图片，以生成猫玩毛线的视频。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": cat_b64, "mime_type": "image/png"},
        {"type": "image", "data": yarn_b64, "mime_type": "image/png"},
        {"type": "text", "text": "A cat playfully batting at a ball of yarn."}
    ],
)
with open("cat.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: catData, mime_type: 'image/png' },
    { type: 'image', data: yarnData, mime_type: 'image/png' },
    { type: 'text', text: 'A cat playfully batting at a ball of yarn.' }
  ]
});

if (interaction.output_video?.data) {
  fs.writeFileSync('cat.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": [
   {"type": "image", "data": "'"$CAT_B64"'", "mime_type": "image/png"},
   {"type": "image", "data": "'"$YARN_B64"'", "mime_type": "image/png"},
   {"type": "text", "text": "A cat playfully batting at a ball of yarn."}
 ]
}'

任务参数

在 video-config 中使用 task 参数，清楚地指明预期行为。例如，如果您希望模型根据图片生成视频，可以将该参数设置为 image_to_video。如果未设置此参数，模型将根据提示推断您想要的内容。

允许使用以下值：

text_to_video
image_to_video
reference_to_video
edit

以下示例展示了如何为之前显示的图生视频示例设置此参数。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": base64_image, "mime_type": "image/jpeg"},
        {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
    ],
    generation_config={
      "video_config": {
        "task": "image_to_video",
      }
    },
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: base64Image, mime_type: 'image/jpeg' },
    { type: 'text', text: 'turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video' }
  ],
  generationConfig: {
    videoConfig: {
      task: 'image_to_video',
    }
  }
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-omni-flash-preview",
    "input": [
      {
        "type": "image",
        "data": "'"$BASE64_IMAGE"'",
        "mime_type": "image/jpeg"
      },
      {
        "type": "text",
        "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"
      }
    ],
    "generation_config": {
      "video_config": {
        "task": "image_to_video"
      }
    }
  }'

有状态视频编辑

生成视频，并使用后续提示以迭代方式对其进行编辑。每一轮都基于上一个结果。模型会记住视频上下文，应用您的更改，同时保留您未提及的元素。使用 previous_interaction_id 跟踪对话历史记录和生成的视频状态，而无需重新上传之前的视频。

以下示例演示了如何生成第一个视频，然后对其进行编辑：

Python

import base64
from google import genai

client = genai.Client()

# Turn 1: Generate initial video
res1 = client.interactions.create(model="gemini-omni-flash-preview", input="A woman playing violin outdoors.")

# Turn 2: Edit the previous video
res2 = client.interactions.create(
    model="gemini-omni-flash-preview",
    previous_interaction_id=res1.id,
    input="Make the violin invisible."
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(res2.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

// Turn 1: Generate initial video
const res1 = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A woman playing violin outdoors.',
});

// Turn 2: Edit the previous video
const res2 = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  previous_interaction_id: res1.id,
  input: 'Make the violin invisible.',
});

if (res2.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(res2.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "previous_interaction_id": "'"$PREVIOUS_ID"'",
 "input": "Make the violin invisible."
}'

初始视频示例：

已编辑的视频示例：

对话中的每一轮都会生成一个新视频。模型会理解之前轮次的上下文，让您进行增量更改，例如调整光效和替换背景，而无需重新描述整个场景。

编辑自己的视频

使用 Files API 上传视频，以便使用 Gemini Omni Flash 对其进行编辑。

以下示例展示了如何编辑以下原始视频：

Python

import time
import base64
from google import genai

client = genai.Client()

# Upload video using the file API
video_file = client.files.upload(file="Video.mp4")

while video_file.state == "PROCESSING":
    print('Waiting for video to be processed.')
    time.sleep(10)
    video_file = client.files.get(name=video_file.name)

if video_file.state == "FAILED":
  raise ValueError(video_file.state)
print(f'Video processing complete: ' + video_file.uri)

# Edit your video
interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "document", "uri": video_file.uri},
        {"type": "text", "text": "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material"}
    ],
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

// Upload video using the file API
let videoFile = await ai.files.upload({
  file: 'Video.mp4',
});

while (videoFile.state === 'PROCESSING') {
  console.log('Waiting for video to be processed.');
  await new Promise(r => setTimeout(r, 10000));
  videoFile = await ai.files.get({ name: videoFile.name });
}

if (videoFile.state === 'FAILED') {
  throw new Error(videoFile.state);
}
console.log('Video processing complete: ' + videoFile.uri);

// Edit your video
const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'document', uri: videoFile.uri },
    { type: 'text', text: "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material" }
  ],
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

#!/bin/bash
VIDEO_B64=$(encode_file "$VIDEO_FILE")

curl -sS -w "\n[HTTP %{http_code}]\n" "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d @- <<EOF > video_editing_response.json
{
  "model": "gemini-omni-flash-preview",
  "input": [
    {
      "type": "user_input",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "data": "$VIDEO_B64"
        },
        {
          "type": "text",
          "text": "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material"
        }
      ]
    }
  ],
  "response_format": { "type": "video" }
}
EOF

已编辑的视频示例：

使用 URI 检索视频

在 response_format 中使用 delivery="uri" 参数检索大于 4MB 的生成的视频。这会返回一个 Google 托管的 URI，您可以轮询该 URI，直到视频变为 ACTIVE，然后才能下载。

Python

import time
from google import genai

client = genai.Client()

# 1. Request video via URI delivery
interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A beautiful sunset.",
    response_format={"type": "video", "delivery": "uri"}
)

# 2. Extract file name and poll for ACTIVE state
video_output = interaction.output_video
file_name = video_output.uri.split("/")[-1] # Extract ID

print("Waiting for video processing...")
while True:
    f_info = client.files.get(name=f"files/{file_name}")
    if f_info.state.name == "ACTIVE":
        break
    elif f_info.state.name == "FAILED":
        raise RuntimeError("Generation failed.")
    time.sleep(5)

# 3. Download the final video
video_bytes = client.files.download(file=video_output.uri)
with open("output.mp4", "wb") as f:
    f.write(video_bytes)

JavaScript

import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({});

// 1. Request video via URI delivery
const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A beautiful sunset.',
  response_format: { type: 'video', delivery: 'uri' },
});

// 2. Extract file name and poll for ACTIVE state
const videoOutput = interaction.output_video;
const fileId = videoOutput.uri.match(/files\/([a-zA-Z0-9]+)/)[1];
const name = `files/${fileId}`;

console.log("Waiting for video processing...");
while (true) {
  const fInfo = await ai.files.get({ name });
  if (fInfo.state.name === 'ACTIVE') break;
  if (fInfo.state.name === 'FAILED') throw new Error("Generation failed.");
  await new Promise(r => setTimeout(r, 5000));
}

// 3. Download the final video
await ai.files.download({
  file: videoOutput,
  downloadPath: 'output.mp4',
});
console.log("💾 Saved video to output.mp4");

REST

#!/bin/bash

# 1. Initial request to generate the video
RESPONSE=$(curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A beautiful sunset over a calm ocean.",
 "response_format": {"type": "video", "delivery": "uri"}
}')

# Extract FILE_ID from the URI (e.g., "files/abc-123" -> "abc-123")
FILE_URI=$(echo $RESPONSE | jq -r '.output_video.uri')
FILE_ID=$(echo $FILE_URI | cut -d'/' -f2)

echo "Video requested (ID: $FILE_ID). Waiting for processing..."

# 2. Polling loop
while true; do
 # Get current file status
 STATUS_JSON=$(curl -s -X GET "https://generativelanguage.googleapis.com/v1beta/files/$FILE_ID?key=$API_KEY")
 STATE=$(echo $STATUS_JSON | jq -r '.state')

 if [ "$STATE" == "ACTIVE" ]; then
   echo "Processing complete! Downloading..."
   break
 elif [ "$STATE" == "FAILED" ]; then
   echo "Error: Generation failed."
   exit 1
 else
   echo "Current state: $STATE... (waiting 5s)"
   sleep 5
 fi
done

# 3. Final download
curl -L -X GET "https://generativelanguage.googleapis.com/v1beta/files/$FILE_ID:download?alt=media&key=$API_KEY" \
--output "output.mp4"

echo "Done! Video saved to output.mp4"

原始 REST JSON 结构 (URI)：

{
  "steps": [
    { "type": "user_input", "content": [{"type": "text", "text": "..."}] },
    { "type": "thought", "content": [{"text": "...", "type": "thought"}] },
    {
      "type": "model_output",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "uri": "https://generativelanguage.googleapis.com/v1beta/files/...:download?alt=media"
        }
      ]
    }
  ],
  "id": "v1_...",
  "status": "completed",
  "model": "gemini-omni-flash-preview",
  "object": "interaction"
}

最佳实践

对大型视频使用 URI 传送：对于大于 4MB 的视频（>720p 如果可用），请在 response_format 中使用 delivery="uri"，以避免有效负载大小限制。
优化性能 ：将 background=false、store=false 和 stream=false 设置为更快、同步的一元生成。请注意，将 store=false 设置为表示生成的视频无法在后续轮次中使用 previous_interaction_id 进行编辑。
提示准确性：如需了解详情，请参阅提示指南部分。

限制

欧洲经济区、瑞士和英国不支持上传和编辑包含未成年人的图片。
不支持上传和编辑包含某些可识别人物的图片。
欧洲经济区 (EEA)、瑞士和英国境内的用户目前无法编辑上传的视频（支持编辑模型生成的视频）。
当前版本的 API 不支持上传音频参考。
API 架构接受时长不超过 3 秒的视频参考，但模型目前无法正确处理这些参考。
不支持跨多个视频进行引用或推理。尝试多视频提示可能会导致模型性能下降或产生意外输出。
不支持视频扩展和视频插值（在第一个和最后一个帧之间生成视频）。
不支持语音编辑。
不支持预配吞吐量。
不支持系统指令、温度、top_p、停止序列和负面提示（您可以将负面提示放在常规提示中：例如，“不要执行 X”）。
不支持使用 YouTube 视频作为媒体来源。

技术详情

所有生成的视频都包含 SynthID 水印，该水印对观看者不可见，但可以通过编程方式检测以进行来源验证。
视频生成时间因时长、分辨率和当前 API 负载而异。时长较长、分辨率较高的视频需要更多时间生成。
内容安全过滤条件会应用于输入提示和生成的视频（具体取决于您所在的区域）。违反使用政策的提示将被屏蔽。
完全支持英语 (EN)，但其他语言尚未经过评估，因此可能有效，但结果可能会有所不同。

Gemini Omni Flash 提示指南

本部分包含有关如何有效地提示 Gemini Omni Flash 的提示和示例。

单场景

默认情况下，Omni Flash 会尝试创建包含几个不同镜头的视频。它会尝试根据提示制作有趣的叙事内容。

如果您需要输出视频包含单个场景，则必须提示：

在单个不间断的场景中
在一个连续镜头中
无场景剪辑

例如：

Continuous, unbroken handheld shot of a fluffy tabby cat sitting on a sunny windowsill, looking out into a leafy garden. The cat's tail twitches slowly, and its ears rotate slightly toward ambient noises. Sunbeams illuminate dust motes in the air. Sound design: Gentle breeze, distant bird chirps. No dialogue.

移除不需要的元素

如果生成的视频包含您不需要的内容，请添加简单的负面提示以避免这些内容：

无对话
无装饰
无额外音效

用于编辑的提示

简单的提示最适合视频编辑。过于详细的提示可能会导致意外更改。

以下是更多简单的编辑提示示例：

将此视频制作成动漫
给此人戴上一顶时尚的帽子
将光效更改为更具戏剧性
将标志上的文字更改为“Omni Flash”

在编辑视频的特定方面时，请添加 "Keep everything else the same" 以保持视觉一致性。

以下是一些示例，展示了如何应用此技术：

应避免以下做法：In the video of the man sitting on the sofa, please add a small black cat that runs from the right side of the screen, jumps onto his lap, and then he starts to stroke its head while looking down.
- 化繁为简 Add a cat that jumps onto his lap, he begins to pet it. Keep everything else the same.
应避免以下做法：Please remove the cell phone that the person is holding in their hand and fill in the background so it looks like they are just holding their hand empty.
- 化繁为简 Make the phone invisible. Keep everything else the same.

提示音频

默认情况下，模型会尝试为视频生成合适的音轨。这可能并不总是您想要的结果。您可以使用提示来描述所需的音频类型。如果您希望视频中包含音乐，这一点尤为重要：

包含舒缓的背景音乐
视频具有高能量的电子节拍
音频是背景中播放歌曲的低沉的无线电广播

定时事件

您可以提示在视频中的特定时间发生某些事件，无需精确的语法，可以使用自然语言。这对于创建自己的场景剪辑、节奏或快速序列尤其有用。请参阅以下示例：

3 秒后，一位女性进入场景。
在 5 秒时，背景音频中开始播放合唱。
每 2 秒剪切到新帧。
在快速序列中，每半秒（24fps 时为 12 帧）将场景更改为新位置。

您还可以使用时间码语法：

[0-3s] A person is walking
[3-6s] They stop and turn around
[6-10s] They start running

元提示

您可以问问 Gemini Omni Flash，让它注意视频生成的一般质量或原则：

考虑微细节、表情和时间，以创建非常丰富、详细但完全自然的场景。
在描述人物和环境时要非常详细。将服装设计原则应用于人物。非常具体地描述场景中的人物、物品和对象。
在背景元素中添加大量适当的细节，使场景感觉真实自然。
制作一个快速视频，每 1 秒显示一个不同的稀有 [thing]，播放欢快的音乐，并添加文字来标记该事物。

视频中的文字

您可以提示在视频中添加文字，Gemini Omni 会以正确且可读的方式呈现。如果视频中自然会出现文字（即使在背景元素中），也有助于定义文字内容。

屏幕上一次显示一个字词：“did, you, know, that, Omni, can, do, awesome, text?” 每个字词以不同的动画样式显示 1 秒。无对话。
有一个街头标志，上面写着：“This is an AI generation by Omni”；有一个店面，上面写着：“All you need AI”；有一辆汽车，车牌号为“OMN111”

在提示中使用标记来设置图片角色

您可以使用标记将上传的媒体绑定到特定的生成角色。这样，您就可以指定每张图片是初始帧还是参考。

1. 简单标记（推荐）

对于提示中图片角色明确的简单情况，您可以直接将图片绑定到角色：

<FIRST_FRAME>：将图片用作视频的起始帧，例如： <FIRST_FRAME> a woman is walking
<IMAGE_REF_N>：将图片用作参考，例如：in the style of <IMAGE_REF_0> a woman <IMAGE_REF_1> is walking（结合了第一张图片的样式参考和第二张图片的正文参考）。图片参考从 0 开始。

以下是包含 6 张参考图片的示例：

[0-3s] A studio fashion sequence. Starting with woman <IMAGE_REF_0>, she is holding <IMAGE_REF_1>
[3-6s] Then we see the man <IMAGE_REF_2> holding <IMAGE_REF_3>
[6-10s] And finally another woman <IMAGE_REF_4> who is holding <IMAGE_REF_5> while walking.

2. 显式声明

对于包含多张图片和多个角色的更复杂情况，您可以使用显式前缀标记与自然语言指令后缀配对。

声明来源和参考图片:
- [# Sources <FIRST_FRAME>@Image1] 将使用第一张图片作为起始帧。
- [# References <IMAGE_REF_0>@Image1] 将使用第一张图片作为参考。
- [# References <IMAGE_REF_1>@Image2] 将使用第二张图片作为参考。
- [# References <IMAGE_REF_0>@Image1 <IMAGE_REF_1>@Image2] 将使用两张图片作为参考。
- [# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] 将使用第一张图片作为起始帧，第二张图片作为参考。
引导说明：在提示的末尾添加引导说明：
- 对于起始帧："Use this image as the starting frame."
- 对于参考图片："Use the given image(s) as references for video generation. The images should not be used as literal initial frames."

展开的提示示例：

[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] a woman <IMAGE_REF_0> is walking. Use Image1 as the starting frame. Use Image2 as a reference for the video generation.

后续步骤

在 Omni 快速入门 Colab 中进行实验，开始使用 Gemini Omni Flash。
通过我们的提示设计简介，了解如何撰写更好的提示。