MERLIN
MERLIN: Conversational Video Search, Tailored to Your Intent
What it does
MERLIN is an innovative video search platform that revolutionizes video content discovery and access. By ingeniously combining Gemini Flash and Vertex multimodal embedding techniques, MERLIN delivers a seamless and intuitive video search experience tailored precisely to your intent.
At its core, MERLIN integrates large language models and cutting-edge multimodal embeddings. When you submit an initial text query, our backend extracts the query embedding and performs a vector search against pre-computed video embeddings. However, if the results don't quite hit the mark, you can engage in a natural conversation with MERLIN, powered by Gemini Flash.
As you converse, MERLIN leverages Vertex to extract multimodal embeddings from the dialogue, capturing the nuanced context of your needs. These are interpolated with the initial query embedding, and a new vector search is performed against the video database.
Throughout this process, vector embeddings and metadata are seamlessly stored in Firestore, while the actual videos and thumbnails reside in Firebase, ensuring a smooth and responsive experience.
By harnessing the combined power of Gemini Flash's conversational AI and Vertex's multimodal capabilities, MERLIN truly understands your video search intent like never before, surfacing the most relevant and accurate results.
This Project is derived from our work MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline.
Built with
- Web/Chrome
- Firebase
- Python
Team
By
MERLIN: Your Intelligent video search companion
From
South Korea