Simón

Multimodal toy robot that function-calls generated behaviors

What it does

Simón tries to imitate humans, like the game of "Simon says". First a human will record a short video, image, or audio via a Chrome Gradio Python app on a touchscreen display. The Gemini API uploads this media input and gets a text description of the scene and any humans. The text description is then prompt-engineered such that Gemini function-calling chooses the best possible robot behavior function out of a couple dozen candidates. Robot behavior functions are hand-crafted, but new novel behaviors can also be generated with Gemini (code generation) using a script. We hosted a YouTube livestream instructing developers on how they can create their own robot behavior functions. Simón is made of foam, socks, and tape and runs on a Raspberry Pi with a camera, USB microphone and speakers, three hobby servos, two LED eyes, and a touchscreen display. All code is open sourced and we provide a full Build Guide with install instructions and a BOM. We provide a helper script that lets developers ask questions about Simón to a Gemini chat instance, pre-populating with relevant context. All code is written in Python and we use the async module to run behavior functions and Gemini API calls in parallel. The code design is modular for easy customizability and extendability. Our hope is that developers can use Simón as a launching point to build their own robotics projects that use the Gemini API.

Built with

  • Web/Chrome

Team

By

hu-po

From

United States