Interactions API の一般提供を開始しました。この API を使用して、最新の機能とモデルにアクセスすることをおすすめします。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Lyria 3 で音楽を生成する

Lyria 3 は、Gemini API を介して利用できる Google の音楽生成モデルファミリーです。Lyria 3 を使用すると、テキストプロンプトや画像から、高音質の 44.1 kHz ステレオ音声を生成できます。これらのモデルは、ボーカル、タイミングに合わせた歌詞、完全なインストゥルメンタルアレンジなど、構造的な一貫性を提供します。

Lyria 3 ファミリーには次の 2 つのモデルがあります。

モデル	モデル ID	最適な用途	所要時間	出力
Lyria 3 Clip	`lyria-3-clip-preview`	短いクリップ、ループ、プレビュー	30 秒	MP3
Lyria 3 Pro	`lyria-3-pro-preview`	A メロ、サビ、ブリッジを含むフルレングスの曲	数分（プロンプトで制御可能）	MP3

どちらのモデルも新しい Interactions API を使用して使用できます。マルチモーダル入力（テキストと画像）をサポートし、44.1 kHz の高忠実度ステレオ 音声を生成します。

音楽クリップを生成する

Lyria 3 Clip モデルは常に 30 秒 のクリップを生成します。クリップを生成するには、テキストプロンプトを指定して interactions.create メソッドを呼び出します。レスポンスには常に、生成された歌詞と楽曲構成が steps スキーマの音声とともに含まれます。

Python

import base64
from google import genai

client = genai.Client()

interaction = client.interactions.create(
    model="lyria-3-clip-preview",
    input="A short instrumental acoustic guitar piece.",
)

generated_audio = interaction.output_audio
if generated_audio:
    with open("music.mp3", "wb") as f:
        f.write(base64.b64decode(generated_audio.data))

lyrics = interaction.output_text
if lyrics:
    print(f"Lyrics:\n{lyrics}")

JavaScript

import { GoogleGenAI } from '@google/genai';
import * as fs from 'fs';

const client = new GoogleGenAI({});

const interaction = await client.interactions.create({
    model: 'lyria-3-clip-preview',
    input: 'A short instrumental acoustic guitar piece.',
});

const generatedAudio = interaction.output_audio;
if (generatedAudio) {
  fs.writeFileSync('music.mp3', Buffer.from(generatedAudio.data, 'base64'));
}

const lyrics = interaction.output_text;
if (lyrics) {
  console.log(`Lyrics:\n${lyrics}`);
}

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "lyria-3-clip-preview",
    "input": "A short instrumental acoustic guitar piece."
}'

生成された音楽データは、最後に生成された音声ブロックを返す interaction.output_audio プロパティを使用して取得できます。また、interaction.output_text プロパティを使用して、曲の歌詞と構成を取得することもできます。便利なプロパティの詳細については、 Interactions の概要をご覧ください。

フルレングスの曲を生成する

lyria-3-pro-preview モデルを使用して、数分続くフルレングスの曲を生成します。Pro モデルは音楽構造を理解し、明確な A メロ、サビ、ブリッジを含む楽曲を作成できます。プロンプトで期間を指定する（例: 「2 分の曲を作成する」）か、タイムスタンプを使用して構成を定義することで、期間に影響を与えることができます。

Python

interaction = client.interactions.create(
    model="lyria-3-pro-preview",
    input="An epic cinematic orchestral piece about a journey home. Starts with a solo piano intro, builds through sweeping strings, and climaxes with a massive wall of sound.",
)

JavaScript

const interaction = await client.interactions.create({
    model: 'lyria-3-pro-preview',
    input: 'A beautiful piano melody.',
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
    "model": "lyria-3-pro-preview",
    "input": "A beautiful piano melody."
}'

出力形式を選択する

デフォルトでは、Lyria 3 モデルは MP3 形式で音声を生成します。Lyria 3 Pro では、response_format を設定して WAV 形式で出力をリクエストすることもできます。

Python

interaction = client.interactions.create(
    model="lyria-3-pro-preview",
    input="A beautiful piano melody.",
    response_format={"type": "audio"},
)

JavaScript

const interaction = await client.interactions.create({
    model: 'lyria-3-pro-preview',
    input: 'A beautiful piano melody.',
    response_format: {
        type: 'audio',
    },
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lyria-3-pro-preview",
    "input": "A beautiful piano melody.",
    "response_format": {
        "type": "audio"
    }
  }'

レスポンスをパースする

Lyria 3 からのレスポンスには、steps スキーマ内に複数のコンテンツブロックが含まれています。 Interactions は一連のステップを返します。model_output ステップには生成されたコンテンツが含まれます。テキストコンテンツブロックには、生成された歌詞または楽曲構成の JSON 説明が含まれます。 audio タイプのコンテンツブロックには、base64 エンコードされた音声データが含まれます。

Python

lyrics = []
audio_data = None

generated_audio = interaction.output_audio
if generated_audio:
    with open("output.mp3", "wb") as f:
        f.write(base64.b64decode(generated_audio.data))

lyrics = interaction.output_text
if lyrics:
    print(f"Lyrics:\n{lyrics}")

JavaScript

const lyrics = [];
let audioData = null;

const generatedAudio = interaction.output_audio;
if (generatedAudio) {
    fs.writeFileSync("output.mp3", Buffer.from(generatedAudio.data, 'base64'));
}

const lyrics = interaction.output_text;
if (lyrics) {
    console.log("Lyrics:\n" + lyrics);
}

REST

# The output from the REST API is a JSON object containing base64 encoded data.
# You can extract the text or the audio data using a tool like jq.
# To extract the audio and save it to a file:
curl ... | jq -r '.steps[] | select(.type=="model_output") | .content[] | select(.type=="audio") | .data' | base64 -d > output.mp3

インターリーブされた歌詞と音楽

Lyria 3 からの出力は複雑で、生成された歌詞（テキスト）と楽曲自体（音声）の個別のステップとブロックが含まれているため、便利なプロパティを使用すると、推奨されるショートカットをすばやく利用できます。

ただし、サーバーから返されるステップの生のタイムラインを完全にプログラムで制御する場合（受信した個々のコンテンツブロックをログに記録するなど）は、代わりに steps を手動で反復処理できます。

Python

lyrics = []
audio_data = None

for step in interaction.steps:
    if step.type == "model_output":
        for content_block in step.content:
            if content_block.type == "audio":
                audio_data = base64.b64decode(content_block.data)
            elif content_block.type == "text":
                lyrics.append(content_block.text)

if lyrics:
    print("Lyrics:\n" + "\n".join(lyrics))

if audio_data:
    with open("output.mp3", "wb") as f:
        f.write(audio_data)

JavaScript

const lyrics = [];
let audioData = null;

for (const step of interaction.steps) {
    if (step.type === 'model_output') {
        for (const contentBlock of step.content) {
            if (contentBlock.type === 'audio') {
                audioData = Buffer.from(contentBlock.data, 'base64');
            } else if (contentBlock.type === 'text') {
                lyrics.push(contentBlock.text);
            }
        }
    }
}

if (lyrics.length) {
    console.log("Lyrics:\n" + lyrics.join("\n"));
}

if (audioData) {
    fs.writeFileSync("output.mp3", audioData);
}

画像から音楽を生成する

Lyria 3 はマルチモーダル入力をサポートしています。input リストのテキストプロンプトとともに最大 10 個の画像 を指定すると、モデルはビジュアルコンテンツにインスパイアされた音楽を作成します。

Python

import base64

with open("desert_sunset.jpg", "rb") as f:
    image_bytes = f.read()
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")

response = client.interactions.create(
    model="lyria-3-pro-preview",
    input=[
        {
            "type": "text",
            "text": "An atmospheric ambient track inspired by the mood and colors in this image.",
        },
        {
            "type": "image",
            "mime_type": "image/jpeg",
            "data": image_b64,
        },
    ],
)

JavaScript

import * as fs from "fs";

const imageBytes = fs.readFileSync("desert_sunset.jpg").toString("base64");

const interaction = await client.interactions.create({
    model: "lyria-3-pro-preview",
    input: [
        {
            type: "text",
            text: "An atmospheric ambient track inspired by the mood and colors in this image.",
        },
        {
            type: "image",
            mime_type: "image/jpeg",
            data: imageBytes,
        },
    ],
});

REST

# Pass base64 encoded image data directly:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "lyria-3-pro-preview",
    "input": [
      {"type": "text", "text": "An atmospheric ambient track inspired by the mood and colors in this image."},
      {"type": "image", "mime_type": "image/jpeg", "data": "/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAP//////////////////////////////////////////////////////////////////////////////////////wgALCAABAAEBAREA/8QAFBABAAAAAAAAAAAAAAAAAAAAAP/aAAgBAQABPxA="}
    ]
  }'

カスタムの歌詞を指定する

独自の歌詞を作成して、プロンプトに含めることができます。[Verse]、[Chorus]、[Bridge] などのセクションタグを使用して、モデルが楽曲構成を理解できるようにします。

Python

prompt = """
Create a dreamy indie pop song with the following lyrics:

[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.

[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.

[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
"""

interaction = client.interactions.create(
    model="lyria-3-pro-preview",
    input=prompt,
)

JavaScript

const prompt = `
Create a dreamy indie pop song with the following lyrics:

[Verse 1]
Walking through the neon glow,
city lights reflect below,
every shadow tells a story,
every corner, fading glory.

[Chorus]
We are the echoes in the night,
burning brighter than the light,
hold on tight, don't let me go,
we are the echoes down below.

[Verse 2]
Footsteps lost on empty streets,
rhythms sync to heartbeats,
whispers carried by the breeze,
dancing through the autumn leaves.
`;

const interaction = await client.interactions.create({
    model: 'lyria-3-pro-preview',
    input: prompt,
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lyria-3-pro-preview",
    "input": "Create a dreamy indie pop song with the following lyrics: ..."
  }'

タイミングと構造を制御する

タイムスタンプを使用して、楽曲の特定の瞬間に何が起こるかを正確に指定できます。これは、楽器の開始タイミング、歌詞の配信タイミング、楽曲の進行方法を制御するのに役立ちます。

Python

prompt = """
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
              vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
              and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
              synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
"""

interaction = client.interactions.create(
    model="lyria-3-pro-preview",
    input=prompt,
)

JavaScript

const prompt = `
[0:00 - 0:10] Intro: Begin with a soft lo-fi beat and muffled
              vinyl crackle.
[0:10 - 0:30] Verse 1: Add a warm Fender Rhodes piano melody
              and gentle vocals singing about a rainy morning.
[0:30 - 0:50] Chorus: Full band with upbeat drums and soaring
              synth leads. The lyrics are hopeful and uplifting.
[0:50 - 1:00] Outro: Fade out with the piano melody alone.
`;

const interaction = await client.interactions.create({
    model: 'lyria-3-pro-preview',
    input: prompt,
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lyria-3-pro-preview",
    "input": "[0:00 - 0:10] Intro: ..."
  }'

インストゥルメンタルトラックを生成する

バックグラウンドミュージック、ゲームサウンドトラック、ボーカルが不要なユースケースでは、インストゥルメンタルのみのトラックを生成するようにモデルにプロンプトを表示できます。

Python

interaction = client.interactions.create(
    model="lyria-3-clip-preview",
    input="A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.",
)

JavaScript

const interaction = await client.interactions.create({
    model: 'lyria-3-clip-preview',
    input: 'A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals.',
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lyria-3-clip-preview",
    "input": "A bright chiptune melody in C Major, retro 8-bit video game style. Instrumental only, no vocals."
  }'

さまざまな言語で音楽を生成する

Lyria 3 は、プロンプトの言語で歌詞を生成します。フランス語の歌詞を含む楽曲を生成するには、プロンプトをフランス語で記述します。モデルは、言語に合わせてボーカルスタイルと発音を調整します。

Python

interaction = client.interactions.create(
    model="lyria-3-pro-preview",
    input="Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.",
)

JavaScript

const interaction = await client.interactions.create({
    model: 'lyria-3-pro-preview',
    input: 'Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique.',
});

REST

curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lyria-3-pro-preview",
    "input": "Crée une chanson pop romantique en français sur un coucher de soleil à Paris. Utilise du piano et de la guitare acoustique."
  }'

モデルインテリジェンス

Lyria 3 は、プロンプトに基づいてモデルが楽曲構成（イントロ、A メロ、サビ、ブリッジなど）を推論するプロンプトプロセスを分析します。これは音声が生成される前に行われ、構造的な一貫性と音楽性を確保します。

プロンプトガイド

プロンプトが具体的であるほど、より良い結果が得られます。生成をガイドするために含めることができる要素は次のとおりです。

ジャンル : ジャンルまたはジャンルの組み合わせを指定します（例: 「ローファイヒップホップ」、「ジャズフュージョン」、「映画音楽のようなオーケストラ」）。
楽器: 特定の楽器の名前を指定します（例: 「フェンダーローズピアノ」、「スライドギター」、「TR-808 ドラムマシン」）。
BPM: テンポを設定します（例: 「120 BPM」、「70 BPM 前後の遅いテンポ」）。
キー/スケール: 音楽キーを指定します（例: 「ト長調」、「ニ短調」）。
ムードと雰囲気: 説明的な形容詞を使用します（例: 「ノスタルジック」、「アグレッシブ」、「エーテル」、「夢のような」）。
構造: [Verse]、[Chorus]、[Bridge]、[Intro]、 [Outro] などのタグまたはタイムスタンプを使用して、楽曲の進行を制御します。
期間: Clip モデルは常に 30 秒のクリップを生成します。Pro モデルの場合は、プロンプトで目的の長さを指定する（例: 「2 分の曲を作成する」）か、タイムスタンプを使用して期間を制御します。

プロンプトの例

効果的なプロンプトの例を次に示します。

"A 30-second lofi hip hop beat with dusty vinyl crackle, mellow Rhodes piano chords, a slow boom-bap drum pattern at 85 BPM, and a jazzy upright bass line. Instrumental only."
"An upbeat, feel-good pop song in G major at 120 BPM with bright acoustic guitar strumming, claps, and warm vocal harmonies about a summer road trip."
"A dark, atmospheric trap beat at 140 BPM with heavy 808 bass, eerie synth pads, sharp hi-hats, and a haunting vocal sample. In D minor."

ベストプラクティス

最初に Clip で反復処理します。高速な lyria-3-clip-preview モデルを使用してプロンプトを試してから、lyria-3-pro-preview でフルレングスの生成を行います。
具体的に記述しましょう。曖昧なプロンプトでは一般的な結果が生成されます。最適な出力が得られるように、楽器、BPM、キー、ムード、構造を指定してください。
言語を一致させる。歌詞に使用する言語でプロンプトを表示します。
セクションタグを使用する。[Verse]、[Chorus]、[Bridge] タグを使用すると、モデルが従うべき明確な構造が提供されます。
歌詞と指示を分離する。カスタムの歌詞を指定する場合は、音楽の方向に関する指示と明確に区別してください。

制限事項

安全性: すべてのプロンプトは安全フィルタによってチェックされます。フィルタをトリガーするプロンプトはブロックされます。これには、特定のアーティストの音声や著作権で保護された歌詞の生成をリクエストするプロンプトが含まれます。
透かし: 生成されたすべての音声には、識別用の SynthID オーディオウォーターマークが含まれています。この透かしは人間の耳には聞こえず、リスニングエクスペリエンスに影響しません。
マルチターン編集: 音楽生成はシングルターンプロセスです。現在のバージョンの Lyria 3 では、複数のプロンプトを使用して生成されたクリップを反復的に編集または改良することはできません。
長さ: Clip モデルは常に 30 秒のクリップを生成します。Pro モデルは数分続く楽曲を生成します。正確な期間はプロンプトで制御できます。
決定論: 同じプロンプトでも、呼び出しごとに結果が異なる場合があります。

次のステップ

Lyria 3 モデルの料金を確認する。
リアルタイムのストリーミング音楽生成を Lyria RealTime で試す。
TTS モデルで複数の話者による会話を生成する。
画像や動画を生成する方法を確認する。
Gemini が音声ファイルを理解する方法を確認する。
Live API を使用して Gemini とリアルタイムで会話する。

Lyria 3 で音楽を生成する

音楽クリップを生成する

Python

JavaScript

REST

フルレングスの曲を生成する

Python

JavaScript

REST

出力形式を選択する

Python

JavaScript

REST

レスポンスをパースする

Python

JavaScript

REST

インターリーブされた歌詞と音楽

Python

JavaScript

画像から音楽を生成する

Python

JavaScript

REST

カスタムの歌詞を指定する

Python

JavaScript

REST

タイミングと構造を制御する

Python

JavaScript

REST

インストゥルメンタル トラックを生成する

Python

JavaScript

REST

さまざまな言語で音楽を生成する

Python

JavaScript

REST

モデル インテリジェンス

プロンプト ガイド

プロンプトの例

ベスト プラクティス

制限事項

次のステップ

インストゥルメンタルトラックを生成する

モデルインテリジェンス

プロンプトガイド

ベストプラクティス