使用 Gemini Omni Flash 生成和编辑视频

Gemini Omni Flash (gemini-omni-flash-preview) 是一款高性能多模态模型,专为高速视频生成、编辑和电影控制而设计。 Gemini Omni 基于以下核心功能构建,这些功能使其与之前的视频模型有所不同:

  • 原生多模态 :它可以同时处理文本、图片、音频和视频,为您提供更具凝聚力、一致性和可控性的输出。
  • 对话式编辑:借助 Interactions API,您可以通过自然语言对话以迭代方式优化 和编辑视频。描述您想要更改的内容,模型会在应用编辑的同时保留您想要保留的视频部分。
  • 世界知识 :Gemini Omni 将对物理学的理解与 Gemini 的历史、科学和文化背景知识相结合,弥合了从照片写实主义到有意义的故事讲述之间的差距。

文生视频

根据文本提示生成视频。模型会根据您的文本描述生成带音频的视频。撰写提示时,请提供场景描述、镜头移动、光效和氛围等详细信息,以获得最佳效果。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A marble rolling fast on a chain reaction style track, continuous smooth shot."
)
with open("marble.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({  
  model: 'gemini-omni-flash-preview',  
  input: 'A marble rolling fast on a chain reaction style track, continuous smooth shot.',
});

if (interaction.output_video?.data) {
  fs.writeFileSync('marble.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A marble rolling fast on a chain reaction style track, continuous smooth shot."
}'

REST 响应架构

便捷字段 interaction.output_video 仅适用于 SDK 。 直接使用 REST API 时,请从 steps 数组获取视频输出。

原始 REST JSON 结构

{
  "steps": [
    { "type": "user_input", "content": [{"type": "text", "text": "..."}] },
    { "type": "thought", "content": [{"text": "...", "type": "thought"}] },
    {
      "type": "model_output",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "data": "AAAAIGZ0eXBpc29t..." // Base64 encoded video data
        }
      ]
    }
  ],
  "id": "v1_...",
  "status": "completed",
  "model": "gemini-omni-flash-preview",
  "object": "interaction"
}

控制宽高比

aspect_ratio 设置为 "9:16" 以创建竖屏视频。默认设置为横向 (16:9)。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A futuristic city with neon lights and flying cars, cyberpunk style",
    response_format={
        "type": "video",  # optional
        "aspect_ratio": "9:16"  # Supported values: "9:16", "16:9"
    }
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A futuristic city with neon lights and flying cars, cyberpunk style',
  response_format: {
    type: 'video', // optional
    aspect_ratio: '9:16' // Supported values: '9:16', '16:9'
  },
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A futuristic city with neon lights and flying cars, cyberpunk style",
 "response_format": {
   "type": "video",
   "aspect_ratio": "9:16"
 }
}'

图生视频

您可以提供参考图片以及文本提示。模型将根据您的提示决定如何使用图片。这对于让产品照片、插图或照片栩栩如生非常有用。

以下示例展示了如何使用鱼从水中跳出的绘画的参考图片:

一张鱼跃出水面的绘画

使用以下提示:

turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video

生成绘画的逼真视频。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": base64_image, "mime_type": "image/jpeg"},
        {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
    ],
)
with open("clownfish.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: base64Image, mime_type: 'image/jpeg' },
    { type: 'text', text: 'turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video' }
  ]
});

if (interaction.output_video?.data) {
  fs.writeFileSync('clownfish.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": [
   {"type": "image", "data": "'"$BASE64_IMAGE"'", "mime_type": "image/jpeg"},
   {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
 ]
}'

正文参考

您可以生成包含作为参考图片提供的特定正文的视频。 例如,以下代码展示了如何提供猫和毛线的 2 张图片,以生成猫玩毛线的视频。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": cat_b64, "mime_type": "image/png"},
        {"type": "image", "data": yarn_b64, "mime_type": "image/png"},
        {"type": "text", "text": "A cat playfully batting at a ball of yarn."}
    ],
)
with open("cat.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: catData, mime_type: 'image/png' },
    { type: 'image', data: yarnData, mime_type: 'image/png' },
    { type: 'text', text: 'A cat playfully batting at a ball of yarn.' }
  ]
});

if (interaction.output_video?.data) {
  fs.writeFileSync('cat.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": [
   {"type": "image", "data": "'"$CAT_B64"'", "mime_type": "image/png"},
   {"type": "image", "data": "'"$YARN_B64"'", "mime_type": "image/png"},
   {"type": "text", "text": "A cat playfully batting at a ball of yarn."}
 ]
}'

任务参数

video-config 中使用 task 参数,清楚地指明预期行为。例如,如果您希望模型根据图片生成视频,可以将该参数设置为 image_to_video。如果未设置此参数,模型将根据提示推断您想要的内容。

允许使用以下值:

  • text_to_video
  • image_to_video
  • reference_to_video
  • edit

以下示例展示了如何为之前显示的图生视频示例设置此参数。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "image", "data": base64_image, "mime_type": "image/jpeg"},
        {"type": "text", "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"}
    ],
    generation_config={
      "video_config": {
        "task": "image_to_video",
      }
    },
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from "@google/genai";
import * as fs from 'fs';
const ai = new GoogleGenAI({});

const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'image', data: base64Image, mime_type: 'image/jpeg' },
    { type: 'text', text: 'turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video' }
  ],
  generationConfig: {
    videoConfig: {
      task: 'image_to_video',
    }
  }
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-omni-flash-preview",
    "input": [
      {
        "type": "image",
        "data": "'"$BASE64_IMAGE"'",
        "mime_type": "image/jpeg"
      },
      {
        "type": "text",
        "text": "turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video"
      }
    ],
    "generation_config": {
      "video_config": {
        "task": "image_to_video"
      }
    }
  }'

有状态视频编辑

生成视频,并使用后续提示以迭代方式对其进行编辑。每一轮都基于上一个结果。模型会记住视频上下文,应用您的更改,同时保留您未提及的元素。使用 previous_interaction_id 跟踪对话历史记录和生成的视频状态,而无需重新上传之前的视频。

以下示例演示了如何生成第一个视频,然后对其进行编辑:

Python

import base64
from google import genai

client = genai.Client()

# Turn 1: Generate initial video
res1 = client.interactions.create(model="gemini-omni-flash-preview", input="A woman playing violin outdoors.")

# Turn 2: Edit the previous video
res2 = client.interactions.create(
    model="gemini-omni-flash-preview",
    previous_interaction_id=res1.id,
    input="Make the violin invisible."
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(res2.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

// Turn 1: Generate initial video
const res1 = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A woman playing violin outdoors.',
});

// Turn 2: Edit the previous video
const res2 = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  previous_interaction_id: res1.id,
  input: 'Make the violin invisible.',
});

if (res2.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(res2.output_video.data, 'base64'));
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "previous_interaction_id": "'"$PREVIOUS_ID"'",
 "input": "Make the violin invisible."
}'

初始视频示例:

已编辑的视频示例:

对话中的每一轮都会生成一个新视频。模型会理解之前轮次的上下文,让您进行增量更改,例如调整光效和替换背景,而无需重新描述整个场景。

编辑自己的视频

使用 Files API 上传视频,以便使用 Gemini Omni Flash 对其进行编辑 。

以下示例展示了如何编辑以下原始视频:

Python

import time
import base64
from google import genai

client = genai.Client()

# Upload video using the file API
video_file = client.files.upload(file="Video.mp4")

while video_file.state == "PROCESSING":
    print('Waiting for video to be processed.')
    time.sleep(10)
    video_file = client.files.get(name=video_file.name)

if video_file.state == "FAILED":
  raise ValueError(video_file.state)
print(f'Video processing complete: ' + video_file.uri)

# Edit your video
interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input=[
        {"type": "document", "uri": video_file.uri},
        {"type": "text", "text": "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material"}
    ],
)
with open("example.mp4", "wb") as f:
    f.write(base64.b64decode(interaction.output_video.data))

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const ai = new GoogleGenAI({});

// Upload video using the file API
let videoFile = await ai.files.upload({
  file: 'Video.mp4',
});

while (videoFile.state === 'PROCESSING') {
  console.log('Waiting for video to be processed.');
  await new Promise(r => setTimeout(r, 10000));
  videoFile = await ai.files.get({ name: videoFile.name });
}

if (videoFile.state === 'FAILED') {
  throw new Error(videoFile.state);
}
console.log('Video processing complete: ' + videoFile.uri);

// Edit your video
const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: [
    { type: 'document', uri: videoFile.uri },
    { type: 'text', text: "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material" }
  ],
});

if (interaction.output_video?.data) {
  fs.writeFileSync('example.mp4', Buffer.from(interaction.output_video.data, 'base64'));
}

REST

#!/bin/bash
VIDEO_B64=$(encode_file "$VIDEO_FILE")

curl -sS -w "\n[HTTP %{http_code}]\n" "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d @- <<EOF > video_editing_response.json
{
  "model": "gemini-omni-flash-preview",
  "input": [
    {
      "type": "user_input",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "data": "$VIDEO_B64"
        },
        {
          "type": "text",
          "text": "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material"
        }
      ]
    }
  ],
  "response_format": { "type": "video" }
}
EOF

已编辑的视频示例:

使用 URI 检索视频

response_format 中使用 delivery="uri" 参数检索大于 4MB 的生成的视频。 这会返回一个 Google 托管的 URI,您可以轮询该 URI,直到视频变为 ACTIVE,然后才能下载。

Python

import time
from google import genai

client = genai.Client()

# 1. Request video via URI delivery
interaction = client.interactions.create(
    model="gemini-omni-flash-preview",
    input="A beautiful sunset.",
    response_format={"type": "video", "delivery": "uri"}
)

# 2. Extract file name and poll for ACTIVE state
video_output = interaction.output_video
file_name = video_output.uri.split("/")[-1] # Extract ID

print("Waiting for video processing...")
while True:
    f_info = client.files.get(name=f"files/{file_name}")
    if f_info.state.name == "ACTIVE":
        break
    elif f_info.state.name == "FAILED":
        raise RuntimeError("Generation failed.")
    time.sleep(5)

# 3. Download the final video
video_bytes = client.files.download(file=video_output.uri)
with open("output.mp4", "wb") as f:
    f.write(video_bytes)

JavaScript

import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({});

// 1. Request video via URI delivery
const interaction = await ai.interactions.create({
  model: 'gemini-omni-flash-preview',
  input: 'A beautiful sunset.',
  response_format: { type: 'video', delivery: 'uri' },
});

// 2. Extract file name and poll for ACTIVE state
const videoOutput = interaction.output_video;
const fileId = videoOutput.uri.match(/files\/([a-zA-Z0-9]+)/)[1];
const name = `files/${fileId}`;

console.log("Waiting for video processing...");
while (true) {
  const fInfo = await ai.files.get({ name });
  if (fInfo.state.name === 'ACTIVE') break;
  if (fInfo.state.name === 'FAILED') throw new Error("Generation failed.");
  await new Promise(r => setTimeout(r, 5000));
}

// 3. Download the final video
await ai.files.download({
  file: videoOutput,
  downloadPath: 'output.mp4',
});
console.log("💾 Saved video to output.mp4");

REST

#!/bin/bash

# 1. Initial request to generate the video
RESPONSE=$(curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/interactions?key=$API_KEY" \
-H "Content-Type: application/json" \
-d '{
 "model": "gemini-omni-flash-preview",
 "input": "A beautiful sunset over a calm ocean.",
 "response_format": {"type": "video", "delivery": "uri"}
}')

# Extract FILE_ID from the URI (e.g., "files/abc-123" -> "abc-123")
FILE_URI=$(echo $RESPONSE | jq -r '.output_video.uri')
FILE_ID=$(echo $FILE_URI | cut -d'/' -f2)

echo "Video requested (ID: $FILE_ID). Waiting for processing..."

# 2. Polling loop
while true; do
 # Get current file status
 STATUS_JSON=$(curl -s -X GET "https://generativelanguage.googleapis.com/v1beta/files/$FILE_ID?key=$API_KEY")
 STATE=$(echo $STATUS_JSON | jq -r '.state')

 if [ "$STATE" == "ACTIVE" ]; then
   echo "Processing complete! Downloading..."
   break
 elif [ "$STATE" == "FAILED" ]; then
   echo "Error: Generation failed."
   exit 1
 else
   echo "Current state: $STATE... (waiting 5s)"
   sleep 5
 fi
done

# 3. Final download
curl -L -X GET "https://generativelanguage.googleapis.com/v1beta/files/$FILE_ID:download?alt=media&key=$API_KEY" \
--output "output.mp4"

echo "Done! Video saved to output.mp4"

原始 REST JSON 结构 (URI)

{
  "steps": [
    { "type": "user_input", "content": [{"type": "text", "text": "..."}] },
    { "type": "thought", "content": [{"text": "...", "type": "thought"}] },
    {
      "type": "model_output",
      "content": [
        {
          "type": "video",
          "mime_type": "video/mp4",
          "uri": "https://generativelanguage.googleapis.com/v1beta/files/...:download?alt=media"
        }
      ]
    }
  ],
  "id": "v1_...",
  "status": "completed",
  "model": "gemini-omni-flash-preview",
  "object": "interaction"
}


最佳实践

  • 对大型视频使用 URI 传送:对于大于 4MB 的视频(>720p 如果可用),请在 response_format 中使用 delivery="uri",以避免有效负载 大小限制。
  • 优化性能 :将 background=falsestore=falsestream=false 设置为更快、同步的一元生成。请注意,将 store=false 设置为表示生成的视频无法在后续轮次中使用 previous_interaction_id 进行编辑。
  • 提示准确性:如需了解详情,请参阅提示指南部分。

限制

  • 欧洲经济区、瑞士和英国不支持上传和编辑包含未成年人的图片。
  • 不支持上传和编辑包含某些可识别人物的图片。
  • 欧洲经济区 (EEA)、瑞士和英国境内的用户目前无法编辑上传的视频(支持编辑模型生成的视频)。
  • 当前版本的 API 不支持上传音频参考。
  • API 架构接受时长不超过 3 秒的视频参考,但模型目前无法正确处理这些参考。
  • 不支持跨多个视频进行引用或推理。尝试多视频提示可能会导致模型性能下降或产生意外输出。
  • 不支持视频扩展和视频插值(在第一个和最后一个帧之间生成视频)。
  • 不支持语音编辑。
  • 不支持预配吞吐量。
  • 不支持系统指令、温度、top_p、停止序列和负面提示(您可以将负面提示放在常规提示中:例如,“不要执行 X”)。
  • 不支持使用 YouTube 视频作为媒体来源。

技术详情

  • 所有生成的视频都包含 SynthID 水印,该水印对观看者不可见,但可以通过编程方式检测以进行来源验证。
  • 视频生成时间因时长、分辨率和当前 API 负载而异。时长较长、分辨率较高的视频需要更多时间生成。
  • 内容安全过滤条件会应用于输入提示和生成的视频(具体取决于您所在的区域)。违反使用政策的提示将被屏蔽。
  • 完全支持英语 (EN),但其他语言尚未经过评估,因此可能有效,但结果可能会有所不同。

Gemini Omni Flash 提示指南

本部分包含有关如何有效地提示 Gemini Omni Flash 的提示和示例。

单场景

默认情况下,Omni Flash 会尝试创建包含几个不同镜头的视频。 它会尝试根据提示制作有趣的叙事内容。

如果您需要输出视频包含单个场景,则必须提示:

  • 在单个不间断的场景中
  • 在一个连续镜头中
  • 无场景剪辑

例如:

Continuous, unbroken handheld shot of a fluffy tabby cat sitting on a sunny windowsill, looking out into a leafy garden. The cat's tail twitches slowly, and its ears rotate slightly toward ambient noises. Sunbeams illuminate dust motes in the air. Sound design: Gentle breeze, distant bird chirps. No dialogue.

移除不需要的元素

如果生成的视频包含您不需要的内容,请添加简单的负面提示以避免这些内容:

  • 无对话
  • 无装饰
  • 无额外音效

用于编辑的提示

简单的提示最适合视频编辑。过于详细的提示可能会导致意外更改。

以下是更多简单的编辑提示示例:

  • 将此视频制作成动漫
  • 给此人戴上一顶时尚的帽子
  • 将光效更改为更具戏剧性
  • 将标志上的文字更改为“Omni Flash”

在编辑视频的特定方面时,请添加 "Keep everything else the same" 以保持视觉一致性。

以下是一些示例,展示了如何应用此技术:

  • 应避免以下做法In the video of the man sitting on the sofa, please add a small black cat that runs from the right side of the screen, jumps onto his lap, and then he starts to stroke its head while looking down.
    • 化繁为简 Add a cat that jumps onto his lap, he begins to pet it. Keep everything else the same.
  • 应避免以下做法Please remove the cell phone that the person is holding in their hand and fill in the background so it looks like they are just holding their hand empty.
    • 化繁为简 Make the phone invisible. Keep everything else the same.

提示音频

默认情况下,模型会尝试为视频生成合适的音轨。这可能并不总是您想要的结果。您可以使用提示来描述所需的音频类型。如果您希望视频中包含音乐,这一点尤为重要:

  • 包含舒缓的背景音乐
  • 视频具有高能量的电子节拍
  • 音频是背景中播放歌曲的低沉的无线电广播

定时事件

您可以提示在视频中的特定时间发生某些事件,无需精确的语法,可以使用自然语言。这对于创建自己的场景剪辑、节奏或快速序列尤其有用。 请参阅以下示例:

  • 3 秒后,一位女性进入场景。
  • 在 5 秒时,背景音频中开始播放合唱。
  • 每 2 秒剪切到新帧。
  • 在快速序列中,每半秒(24fps 时为 12 帧)将场景更改为新位置。

您还可以使用时间码语法:

[0-3s] A person is walking
[3-6s] They stop and turn around
[6-10s] They start running

元提示

您可以问问 Gemini Omni Flash,让它注意视频生成的一般质量或原则:

  • 考虑微细节、表情和时间,以创建非常丰富、详细但完全自然的场景。
  • 在描述人物和环境时要非常详细。 将服装设计原则应用于人物。非常具体地描述场景中的人物、物品和对象。
  • 在背景元素中添加大量适当的细节,使场景感觉真实自然。
  • 制作一个快速视频,每 1 秒显示一个不同的稀有 [thing],播放欢快的音乐,并添加文字来标记该事物。

视频中的文字

您可以提示在视频中添加文字,Gemini Omni 会以正确且可读的方式呈现。如果视频中自然会出现文字(即使在背景元素中),也有助于定义文字内容。

  • 屏幕上一次显示一个字词:“did, you, know, that, Omni, can, do, awesome, text?” 每个字词以不同的动画样式显示 1 秒。无对话。
  • 有一个街头标志,上面写着:“This is an AI generation by Omni”;有一个店面,上面写着:“All you need AI”;有一辆汽车,车牌号为“OMN111”

在提示中使用标记来设置图片角色

您可以使用标记将上传的媒体绑定到特定的生成角色。这样,您就可以指定每张图片是初始帧还是参考。

1. 简单标记(推荐)

对于提示中图片角色明确的简单情况,您可以直接将图片绑定到角色:

  • <FIRST_FRAME>:将图片用作视频的起始帧,例如: <FIRST_FRAME> a woman is walking
  • <IMAGE_REF_N>:将图片用作参考,例如:in the style of <IMAGE_REF_0> a woman <IMAGE_REF_1> is walking(结合了第一张图片的样式 参考和第二张图片的正文参考)。 图片参考从 0 开始。

以下是包含 6 张参考图片的示例:

[0-3s] A studio fashion sequence. Starting with woman <IMAGE_REF_0>, she is holding <IMAGE_REF_1>
[3-6s] Then we see the man <IMAGE_REF_2> holding <IMAGE_REF_3>
[6-10s] And finally another woman <IMAGE_REF_4> who is holding <IMAGE_REF_5> while walking.

2. 显式声明

对于包含多张图片和多个角色的更复杂情况,您可以使用显式前缀标记与自然语言指令后缀配对。

  • 声明来源和参考图片:
    • [# Sources <FIRST_FRAME>@Image1] 将使用第一张图片作为起始帧。
    • [# References <IMAGE_REF_0>@Image1] 将使用第一张图片作为参考。
    • [# References <IMAGE_REF_1>@Image2] 将使用第二张图片作为参考。
    • [# References <IMAGE_REF_0>@Image1 <IMAGE_REF_1>@Image2] 将使用两张图片作为参考。
    • [# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] 将使用第一张图片作为起始帧,第二张图片作为参考。
  • 引导说明:在提示的末尾添加引导说明:
    • 对于起始帧:"Use this image as the starting frame."
    • 对于参考图片:"Use the given image(s) as references for video generation. The images should not be used as literal initial frames."

展开的提示示例:

[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] a woman <IMAGE_REF_0> is walking. Use Image1 as the starting frame. Use Image2 as a reference for the video generation.

后续步骤