使用 Lyria 3 生成音乐
Lyria 3 是 Google 音乐生成模型系列中的一员,可通过 Gemini API 使用。借助 Lyria 3,您可以根据文本提示或图片生成高质量的 44.1 kHz 立体声音频。这些模型可提供结构连贯的音乐,包括人声、同步歌词及完整的器乐编排。
Lyria 3 系列包含两个模型:
| 模型 | 模型 ID | 适用场景 | 时长 | 输出 |
|---|---|---|---|---|
| Lyria 3 Clip | lyria-3-clip-preview |
短片段、循环、预览 | 30 秒 | MP3 |
| Lyria 3 Pro | lyria-3-pro-preview |
包含主歌、副歌和桥段的完整歌曲 | 几分钟(可使用提示控制) | MP3 |
这两个模型都可以使用新的 Interactions API, 支持多模态输入(文本和图片),并生成 44.1 kHz 高保真立体声 音频。
生成音乐片段
Lyria 3 Clip 模型始终生成 30 秒 的片段。如需生成片段,请使用文本提示调用 interactions.create 方法。响应始终包含生成的歌词和歌曲结构,以及 steps 架构中的音频。
Python
import base64
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="lyria-3-clip-preview",
input="Create a 30-second cheerful acoustic folk song with guitar and harmonica.",
)
for step in interaction.steps:
if step.type == "model_output":
for content_block in step.content:
if content_block.type == "audio":
print(f"Generated audio with mime_type: {content_block.mime_type}")
with open("music.mp3", "wb") as f:
f.write(base64.b64decode(content_block.data))
elif content_block.type == "text":
print(f"Lyrics: {content_block.text}")
JavaScript
import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
model: 'lyria-3-clip-preview',
input: 'Create a 30-second cheerful acoustic folk song with ' +
'guitar and harmonica.',
});
for (const step of interaction.steps) {
if (step.type === 'model_output') {
for (const contentBlock of step.content) {
if (contentBlock.type === 'audio') {
console.log(`Generated audio with mime_type: ${contentBlock.mimeType}`);
fs.writeFileSync('music.mp3', Buffer.from(contentBlock.data, 'base64'));
} else if (contentBlock.type === 'text') {
console.log(`Lyrics: ${contentBlock.text}`);
}
}
}
}
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "lyria-3-clip-preview",
"input": "Create a 30-second cheerful acoustic folk song with guitar and harmonica."
}'
生成完整歌曲
使用 lyria-3-pro-preview 模型生成时长几分钟的完整歌曲。Pro 模型可以理解音乐结构,并能创作出包含层次分明的主歌、副歌和桥段的乐曲。您可以通过在提示中指定时长(例如“创作一首 2 分钟的歌曲”)或通过使用 时间戳 来定义结构,从而影响歌曲的
时长。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="An epic cinematic orchestral piece about a journey home. Starts with a solo piano intro, builds through sweeping strings, and climaxes with a massive wall of sound.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'An epic cinematic orchestral piece about a journey home. ' +
'Starts with a solo piano intro, builds through sweeping ' +
'strings, and climaxes with a massive wall of sound.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "lyria-3-pro-preview",
"input": "An epic cinematic orchestral piece about a journey home. Starts with a solo piano intro, builds through sweeping strings, and climaxes with a massive wall of sound."
}'
选择输出格式
默认情况下,Lyria 3 模型以 MP3 格式生成音频。对于 Lyria 3 Pro,您还可以通过设置 response_mime_type 来请求以 WAV 格式输出。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="An atmospheric ambient track.",
response_modalities=["audio", "text"],
response_mime_type="audio/wav",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'An atmospheric ambient track.',
responseModalities: ["audio", "text"],
responseMimeType: "audio/wav",
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "An atmospheric ambient track.",
"responseModalities": ["audio", "text"],
"responseMimeType": "audio/wav"
}'
解析响应
Lyria 3 的响应包含 steps 架构中的多个内容块。
Interactions 会返回一系列步骤,其中 model_output 步骤包含生成的内容。
文本内容块包含生成的歌词或歌曲结构的 JSON 说明。类型为 audio 的内容块包含经过 base64 编码的音频数据。
Python
lyrics = []
audio_data = None
for step in interaction.steps:
if step.type == "model_output":
for content_block in step.content:
if content_block.type == "audio":
audio_data = base64.b64decode(content_block.data)
elif content_block.type == "text":
lyrics.append(content_block.text)
if lyrics:
print("Lyrics:\n" + "\n".join(lyrics))
if audio_data:
with open("output.mp3", "wb") as f:
f.write(audio_data)
JavaScript
const lyrics = [];
let audioData = null;
for (const step of interaction.steps) {
if (step.type === 'model_output') {
for (const contentBlock of step.content) {
if (contentBlock.type === 'audio') {
audioData = Buffer.from(contentBlock.data, 'base64');
} else if (contentBlock.type === 'text') {
lyrics.push(contentBlock.text);
}
}
}
}
if (lyrics.length) {
console.log("Lyrics:\n" + lyrics.join("\n"));
}
if (audioData) {
fs.writeFileSync("output.mp3", audioData);
}
REST
# The output from the REST API is a JSON object containing base64 encoded data.
# You can extract the text or the audio data using a tool like jq.
# To extract the audio and save it to a file:
curl ... | jq -r '.steps[] | select(.type=="model_output") | .content[] | select(.type=="audio") | .data' | base64 -d > output.mp3
根据图片生成音乐
Lyria 3 支持多模态输入,您可以在 input 列表中提供最多 10 张图片 以及文本提示,模型将根据视觉内容创作音乐。
Python
uploaded_image = client.files.upload(file="desert_sunset.jpg")
response = client.interactions.create(
model="lyria-3-pro-preview",
input=[
{"type": "text", "text": "An atmospheric ambient track inspired by the mood and colors in this image."},
{
"type": "image",
"uri": uploaded_image.uri,
"mime_type": uploaded_image.mime_type
}
],
)
JavaScript
const uploadedImage = await client.files.upload({
file: "desert_sunset.jpg",
config: { mimeType: "image/jpeg" }
});
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: [
{ type: 'text', text: 'An atmospheric ambient track inspired by the mood and colors in this image.' },
{
type: 'image',
uri: uploadedImage.uri,
mimeType: uploadedImage.mimeType
}
],
});
REST
# First upload the image using the Files API, then use the URI:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "lyria-3-pro-preview",
"input": [
{"type": "text", "text": "An atmospheric ambient track inspired by the mood and colors in this image."},
{"type": "image", "uri": "YOUR_FILE_URI", "mime_type": "image/jpeg"}
]
}'
提供自定义歌词
您可以自行编写歌词并将其包含在提示中。使用 [Verse]、[Chorus] 和 [Bridge] 等部分标记来帮助模型理解歌曲结构:
Python
prompt = """
Create a dreamy indie pop song with the following lyrics:
[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.
[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.
[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
"""
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input=prompt,
)
JavaScript
const prompt = `
Create a dreamy indie pop song with the following lyrics:
[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.
[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.
[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
`;
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: prompt,
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "Create a dreamy indie pop song with the following lyrics: ..."
}'
控制时间和结构
您可以使用时间戳来准确指定歌曲中特定时刻发生的情况。这对于控制乐器何时进入、歌词何时出现以及歌曲如何进行非常有用:
Python
prompt = """
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
"""
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input=prompt,
)
JavaScript
const prompt = `
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
`;
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: prompt,
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "[0:00 - 0:10] Intro: ..."
}'
生成纯器乐曲目
对于背景音乐、游戏配乐或任何不需要人声的用例,您可以提示模型生成纯器乐曲目:
Python
interaction = client.interactions.create(
model="lyria-3-clip-preview",
input="A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-clip-preview',
input: 'A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-clip-preview",
"input": "A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals."
}'
生成不同语言的音乐
Lyria 3 会以提示的语言生成歌词。如需生成一首包含法语歌词的歌曲,请使用法语编写提示。模型会调整其人声风格和发音,以与语言相匹配。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique."
}'
模型智能
Lyria 3 会分析您的提示过程,模型会根据您的提示推断音乐结构(前奏、主歌、副歌、桥段等)。这会在生成音频之前发生,以确保结构连贯性和音乐性。
提示指南
提示越具体,结果就越好。您可以添加以下内容来指导生成:
- 流派:指定一种或多种流派(例如“lo-fi hip hop” “爵士融合”“电影管弦乐”)。
- 乐器:指明具体乐器(例如“Fender Rhodes 钢琴”, “滑棒吉他”,“TR-808 鼓机”)。
- BPM:设置节奏(例如“120 BPM”“节奏较慢,约为 70 BPM”)。
- 调/音阶:指定音乐调(例如“G 大调”“D 小调”)。
- 曲调和氛围:使用描述性的形容词(例如“怀旧”, “激进”“空灵”“梦幻”)。
- 结构:使用
[Verse]、[Chorus]、[Bridge]、[Intro]、[Outro]等标记或时间戳来控制歌曲的进行。 - 时长:Clip 模型始终生成 30 秒的片段。对于 Pro 模型,请在提示中指定预期时长(例如“创作一首 2 分钟的歌曲”)或使用时间戳来控制时长。
示例提示
以下是一些有效提示的示例:
"A 30-second lofi hip hop beat with dusty vinyl crackle, mellow Rhodes piano chords, a slow boom-bap drum pattern at 85 BPM, and a jazzy upright bass line. Instrumental only.""An upbeat, feel-good pop song in G major at 120 BPM with bright acoustic guitar strumming, claps, and warm vocal harmonies about a summer road trip.""A dark, atmospheric trap beat at 140 BPM with heavy 808 bass, eerie synth pads, sharp hi-hats, and a haunting vocal sample. In D minor."
最佳做法
- 先使用 Clip 进行迭代。在通过
lyria-3-pro-preview生成完整歌曲之前,先使用速度更快的lyria-3-clip-preview模型来尝试提示。 - 内容要具体。模糊的提示会产生一般性的结果。提及乐器、BPM、调、曲调和结构,以获得最佳输出。
- 使用匹配的语言。使用您希望歌词采用的语言编写提示。
- 使用部分标记。
[Verse]、[Chorus]、[Bridge]标记为模型提供了清晰的结构以供遵循。 - 将歌词与说明分开。提供自定义歌词时,请将其与音乐方向说明明确分开。
限制
- 安全性:所有提示都会经过安全过滤器的检查。触发过滤器的提示将被屏蔽。这包括请求特定音乐人声音或生成受版权保护的歌词的提示。
- 水印:所有生成的音频都包含 SynthID 音频水印,用于 标识。此水印人耳无法察觉,不会影响听觉体验。
- 多轮编辑:音乐生成是一个单轮过程。 Lyria 3 的当前版本不支持通过多个提示迭代编辑或优化生成的片段。
- 时长:Clip 模型始终生成 30 秒的片段。Pro 模型生成的歌曲时长为几分钟;确切时长会受到提示的影响。
- 确定性:即使使用相同的提示,不同调用的结果也可能有所不同。