Blind Companion | Gemini API Developer Competition

Assistive voice technology for blind users

What it does

The application receives sound signals through the user's voice commands and analyzes them using the Google Gemini API to convert these voice commands into actions. These commands can vary. They can be commands to open a specific application and browse a specific website on the Internet. I confirm that the application can handle all of these complex tasks efficiently. Moving to the specifications, it is achieved by the integration of advanced voice recognition technology to capture the voice and convert it into text then integrating text to speech technology that does the opposite process, which allows users to have a comfortable experience without the need to use a keyboard. Using the Google Gemini API, the application can provide quick and accurate responses, which enhances the user experience. Now, it's time to explain what are the steps I followed to implement the idea. After coming up with the idea, I started to think about the tools that I would probably use, and then I started to implement the project. After that, I started working on the application of it. The first problem that I faced was that there was a problem with downloading the Python program. After some research, I found a solution to this problem through Google. This is the code that I have collected through different libraries onlines. Each library's code performs a specific function. After we open the terminal and wait for a few seconds, then we will say the word "Open", and the program is going to open.

Built with

Speech Recognition
Text-to-Speech - TTS

Team

World Assistants

From

Egypt