Gemini Humanoid Robot

Gemini-powered robot creating a more natural and engaging experience.

What it does

The integration involves several key components: the Chatbot Service, the Chatbot Bridge, the Speech Recognition Module, and the Dialogue Module. The Chatbot Service plays a crucial role by managing the entire dialogue history and generating responses based on previous interactions using the Gemini model. Specifically, the Gemini-pro model is used to generate responses. To bridge the communication gap between different programming environments, the Chatbot Bridge utilizes ZeroMQ. This component enables seamless communication between the Chatbot Service and the NaoQi extension modules. The Speech Recognition Module is, responsible for capturing audio input from the robot's microphone using the NaoQi ALAudioDevice. The audio recordings are segmented through volume thresholding. These segmented recordings are then sent to Google's cloud service for speech-to-text analysis. If the speech is successfully recognised, the resulting text is forwarded to the Chatbot Bridge.
The Dialogue Module transforms the text responses generated by Gemini into spoken language using the NaoQi ALAnimatedSpeech. This module also coordinates with the Speech Recognition Module to pause audio recordings while the robot is speaking, ensuring a turn-based dialogue system where the robot listens and responds alternately.

Built with

Humanoid robot

Team

From