使用 Lyria 3 生成音樂
Lyria 3 是 Google 的音樂生成模型系列,可透過 Gemini API 使用。使用 Lyria 3,你可以根據文字提示詞或圖片生成高品質的 44.1 kHz 立體聲音訊。這些模型可提供結構一致的內容,包括人聲、歌詞時間碼和完整樂器編曲。
Lyria 3 系列包含兩種模型:
| 型號 | 模型 ID | 適用情境 | 時間長度 | 輸出 |
|---|---|---|---|---|
| Lyria 3 Clip | lyria-3-clip-preview |
短片、循環播放、預覽 | 30 秒 | MP3 |
| Lyria 3 Pro | lyria-3-pro-preview |
包含主歌、副歌和橋段的完整歌曲 | 幾分鐘 (可使用提示控制) | MP3 |
這兩款模型都可透過新的 Interactions API 使用,支援多模態輸入 (文字和圖片),並產生 44.1 kHz 高保真立體聲音訊。
生成音樂短片
Lyria 3 Clip 模型一律會生成 30 秒片段。如要生成短片,請使用文字提示呼叫 interactions.create 方法。回覆一律會包含生成的歌詞和歌曲結構,以及 steps 結構中的音訊。
Python
import base64
from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="lyria-3-clip-preview",
input="Create a 30-second cheerful acoustic folk song with guitar and harmonica.",
)
for step in interaction.steps:
if step.type == "model_output":
for content_block in step.content:
if content_block.type == "audio":
print(f"Generated audio with mime_type: {content_block.mime_type}")
with open("music.mp3", "wb") as f:
f.write(base64.b64decode(content_block.data))
elif content_block.type == "text":
print(f"Lyrics: {content_block.text}")
JavaScript
import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
model: 'lyria-3-clip-preview',
input: 'Create a 30-second cheerful acoustic folk song with ' +
'guitar and harmonica.',
});
for (const step of interaction.steps) {
if (step.type === 'model_output') {
for (const contentBlock of step.content) {
if (contentBlock.type === 'audio') {
console.log(`Generated audio with mime_type: ${contentBlock.mimeType}`);
fs.writeFileSync('music.mp3', Buffer.from(contentBlock.data, 'base64'));
} else if (contentBlock.type === 'text') {
console.log(`Lyrics: ${contentBlock.text}`);
}
}
}
}
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "lyria-3-clip-preview",
"input": "Create a 30-second cheerful acoustic folk song with guitar and harmonica."
}'
生成完整歌曲
使用 lyria-3-pro-preview 模型生成幾分鐘的完整歌曲。Pro 版模型可瞭解音樂結構,並創作具有不同主歌、副歌和橋段的樂曲。如要影響時長,可以在提示中指定 (例如「創作 2 分鐘的歌曲」),或使用時間戳記定義結構。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="An epic cinematic orchestral piece about a journey home. Starts with a solo piano intro, builds through sweeping strings, and climaxes with a massive wall of sound.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'An epic cinematic orchestral piece about a journey home. ' +
'Starts with a solo piano intro, builds through sweeping ' +
'strings, and climaxes with a massive wall of sound.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"model": "lyria-3-pro-preview",
"input": "An epic cinematic orchestral piece about a journey home. Starts with a solo piano intro, builds through sweeping strings, and climaxes with a massive wall of sound."
}'
選取輸出格式
根據預設,Lyria 3 模型會以 MP3 格式生成音訊。如果是 Lyria 3 Pro,您也可以設定 response_mime_type,要求以 WAV 格式輸出。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="An atmospheric ambient track.",
response_modalities=["audio", "text"],
response_mime_type="audio/wav",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'An atmospheric ambient track.',
responseModalities: ["audio", "text"],
responseMimeType: "audio/wav",
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "An atmospheric ambient track.",
"responseModalities": ["audio", "text"],
"responseMimeType": "audio/wav"
}'
剖析回應
Lyria 3 的回覆會在 steps 架構中包含多個內容區塊。互動會傳回一系列步驟,其中 model_output 步驟包含生成的內容。文字內容區塊包含生成的歌詞,或是歌曲結構的 JSON 說明。audio 類型的內容區塊包含 Base64 編碼的音訊資料。
Python
lyrics = []
audio_data = None
for step in interaction.steps:
if step.type == "model_output":
for content_block in step.content:
if content_block.type == "audio":
audio_data = base64.b64decode(content_block.data)
elif content_block.type == "text":
lyrics.append(content_block.text)
if lyrics:
print("Lyrics:\n" + "\n".join(lyrics))
if audio_data:
with open("output.mp3", "wb") as f:
f.write(audio_data)
JavaScript
const lyrics = [];
let audioData = null;
for (const step of interaction.steps) {
if (step.type === 'model_output') {
for (const contentBlock of step.content) {
if (contentBlock.type === 'audio') {
audioData = Buffer.from(contentBlock.data, 'base64');
} else if (contentBlock.type === 'text') {
lyrics.push(contentBlock.text);
}
}
}
}
if (lyrics.length) {
console.log("Lyrics:\n" + lyrics.join("\n"));
}
if (audioData) {
fs.writeFileSync("output.mp3", audioData);
}
REST
# The output from the REST API is a JSON object containing base64 encoded data.
# You can extract the text or the audio data using a tool like jq.
# To extract the audio and save it to a file:
curl ... | jq -r '.steps[] | select(.type=="model_output") | .content[] | select(.type=="audio") | .data' | base64 -d > output.mp3
根據圖片生成音樂
Lyria 3 支援多模態輸入內容,你可以在 input 清單中提供最多 10 張圖片和文字提示詞,模型就會根據視覺內容創作音樂。
Python
uploaded_image = client.files.upload(file="desert_sunset.jpg")
response = client.interactions.create(
model="lyria-3-pro-preview",
input=[
{"type": "text", "text": "An atmospheric ambient track inspired by the mood and colors in this image."},
{
"type": "image",
"uri": uploaded_image.uri,
"mime_type": uploaded_image.mime_type
}
],
)
JavaScript
const uploadedImage = await client.files.upload({
file: "desert_sunset.jpg",
config: { mimeType: "image/jpeg" }
});
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: [
{ type: 'text', text: 'An atmospheric ambient track inspired by the mood and colors in this image.' },
{
type: 'image',
uri: uploadedImage.uri,
mimeType: uploadedImage.mimeType
}
],
});
REST
# First upload the image using the Files API, then use the URI:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "lyria-3-pro-preview",
"input": [
{"type": "text", "text": "An atmospheric ambient track inspired by the mood and colors in this image."},
{"type": "image", "uri": "YOUR_FILE_URI", "mime_type": "image/jpeg"}
]
}'
提供自訂歌詞
你可以自行撰寫歌詞,並加入提示。使用 [Verse]、[Chorus] 和 [Bridge] 等區段標記,協助模型瞭解歌曲結構:
Python
prompt = """
Create a dreamy indie pop song with the following lyrics:
[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.
[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.
[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
"""
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input=prompt,
)
JavaScript
const prompt = `
Create a dreamy indie pop song with the following lyrics:
[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.
[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.
[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
`;
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: prompt,
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "Create a dreamy indie pop song with the following lyrics: ..."
}'
控制時間和結構
你可以使用時間戳記,在歌曲的特定時間點指定要執行的動作。這項功能有助於控制樂器進入的時間、歌詞的傳送時間,以及歌曲的進展方式:
Python
prompt = """
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
"""
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input=prompt,
)
JavaScript
const prompt = `
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
`;
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: prompt,
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "[0:00 - 0:10] Intro: ..."
}'
生成配樂
如要生成背景音樂、遊戲配樂或任何不需要人聲的音樂,可以提示模型生成純音樂曲目:
Python
interaction = client.interactions.create(
model="lyria-3-clip-preview",
input="A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-clip-preview',
input: 'A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-clip-preview",
"input": "A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals."
}'
生成不同語言的音樂
Lyria 3 會根據提示的語言生成歌詞。如要生成法文歌詞的歌曲,請用法文撰寫提示。模型會根據語言調整語音風格和發音。
Python
interaction = client.interactions.create(
model="lyria-3-pro-preview",
input="Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.",
)
JavaScript
const interaction = await client.interactions.create({
model: 'lyria-3-pro-preview',
input: 'Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.',
});
REST
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lyria-3-pro-preview",
"input": "Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique."
}'
模型智慧
Lyria 3 會分析提示程序,根據提示透過音樂結構 (前奏、主歌、副歌、橋段等) 推理。這項程序會在生成音訊前執行,確保結構一致性和音樂性。
提示撰寫指南
提示越具體,結果就越符合需求。你可以加入以下內容,引導系統生成圖片:
- 類型:指定類型或類型組合 (例如「lo-fi hip hop」、「jazz fusion」、「cinematic orchestral」)。
- 樂器:指明樂器名稱 (例如「Fender Rhodes 鋼琴」、「滑音吉他」、「TR-808 鼓機」)。
- BPM:設定節奏 (例如「120 BPM」、「70 BPM 左右的慢節奏」)。
- 調性/音階:指定音樂調性 (例如「G 大調」、「D 小調」)。
- 情緒和氛圍:使用描述性形容詞 (例如「懷舊」、「激進」、「空靈」、「夢幻」)。
- 結構:使用
[Verse]、[Chorus]、[Bridge]、[Intro]、[Outro]或時間戳記等標記,控制歌曲的進展。 - 長度:短片模型一律會生成 30 秒的短片。如果是 Pro 模型,請在提示詞中指定預期長度 (例如「創作 2 分鐘的歌曲」),或使用時間戳記控制長度。
提示詞範例
以下列舉幾個有效的提示:
"A 30-second lofi hip hop beat with dusty vinyl crackle, mellow Rhodes piano chords, a slow boom-bap drum pattern at 85 BPM, and a jazzy upright bass line. Instrumental only.""An upbeat, feel-good pop song in G major at 120 BPM with bright acoustic guitar strumming, claps, and warm vocal harmonies about a summer road trip.""A dark, atmospheric trap beat at 140 BPM with heavy 808 bass, eerie synth pads, sharp hi-hats, and a haunting vocal sample. In D minor."
最佳做法
- 先使用 Clip 進行疊代。使用速度較快的
lyria-3-clip-preview模型測試提示,再使用lyria-3-pro-preview生成完整長度的內容。 - 提供清楚明確的說明,模糊不清的提示會產生一般結果。提及樂器、BPM、調性、情境和結構,以獲得最佳輸出內容。
- 語言相符。以所需語言輸入提示。
- 使用章節標記。
[Verse]、[Chorus]、[Bridge]標記可為模型提供明確的結構。 - 歌詞和指示分開。提供自訂歌詞時,請清楚區隔歌詞和音樂方向指示。
限制
- 安全性:所有提示都會經過安全篩選器檢查。系統會封鎖觸發篩選器的提示。包括要求特定藝人聲音的提示,或是生成受著作權保護的歌詞。
- 浮水印:所有生成的音訊都會加上 SynthID 音訊浮水印,以利識別。這種浮水印人耳無法辨識,不會影響聆聽體驗。
- 多輪編輯:生成音樂是單輪程序。 目前版本的 Lyria 3 不支援透過多個提示詞,反覆編輯或修正生成的片段。
- 長度:片段模型一律會生成 30 秒的片段。Pro 模型會生成幾分鐘的歌曲,確切時長取決於提示。
- 決定性:即使使用相同提示,每次呼叫的結果也可能不同。