Audio Description Generator

Create descriptive audio tracks for YouTube videos within minutes.

What it does

The Audio Description Generator app is a tool for creating descriptive audio tracks for short YouTube videos within minutes. Once given a YouTube link, the app fetches the video, title, and description. It then begins by splitting the video into smaller chunks. These chunks, alongside the YouTube data, are first used to create a "context file" using Gemini, this acts as a first pass to detect general details and identify any characters. Then each chunk is used to make a "loudness file" which measures the volume at every interval and a "transcript" (using Gemini) which lists the dialogue from the video with timestamps. All this information is then fed to Gemini once again to create a "script" of observations with timestamps. These scripts are then put through Google Cloud's Text-to-Speech, the resulting audio is stitched back together, and the final result is presented to the user.

Built with

  • Web/Chrome
  • Google Cloud: Text-to-Speech

Team

By

Ryan Baumgart

From

Canada