LangFlip

Translate & Lip-sync your videos into any language.

What it does

Langflip relies heavily on the Gemini Multimodal API. It will send the original video to Gemini and ask Gemini to :
1. Generate captions for the videos, one of the main complexities of translating videos is maintaining the rhythm of the original video. For example, if we want to translate an English video into German, it is very likely that the translation will have more words and will be longer than the original video. Gemini gives the ability to detect when the speaker takes a break and group the caption until a break.
2. Generate the translations, again, we want to maintain the rhythm of the original videos. Google Translate will provide a literal translation of the original sentences. However, we want translations that are more or less the same length as the original video. Gemini gives the ability to generate translations with more or less the same number of characters than the original sentence translated.
3. Detect which frames need to be lip-synced, we want to send only the frames where the speaker is clearly visible and talking to the lip-syncing AI model. If we send frames with no visible face to the AI, it might crash and fail the lip-syncing process. Gemini provides the ability to send the video and receive all the timestamps where the speaker appears in the video.

Built with

Flutter
Firebase

Team

Rémy Menard

From

France