AIAutoResearcher

Summarize and explain latest AI research in format of YouTube video

What it does

The application checks the newest research about AI on Arxiv and analyses it using Gemini AI API. It produces youtube script containing introduction, analysis, outro and some useful metadata (like video description, tags, title etc). Then it uses local TortoiseTTS to produce audio. Then it uses local ComfyUI to create lip-synced avatar. After that, created artifacts are combined into a youtube-compatible video and uploaded to Youtube automatically using YouTube API. The metadata (like title, description, tags etc) is filled in automatically as well.
To make it robust I decided to utilise a chain of prompts to Gemini LLM. This approach allowed for better control over content, and made responses much more engaging and easier to follow. I decided to use JSON format (and validation of required fields) for all the requests to ensure that the model properly interprets requirements and responds with proper format.
Replacing local and open source TortoiseTTS installation with paid Google TTS API will lead to higher quality of audio and faster processing time.

Built with

  • Web/Chrome
  • Youtube API

Team

By

Paweł Szpyt

From

Poland