DEC 11, 2024
Gemini Powers tldraw's "Natural Language Computing" Experience
Unlocking Natural Language Interactions with the Gemini API
The Gemini API empowers developers to seamlessly integrate advanced AI capabilities into their applications, unlocking new possibilities for user experience and functionality. This post highlights how tldraw leverages Gemini to build a revolutionary "natural language computing" experience within their new project, computer. This demonstrates the speed and ease with which startups can integrate powerful AI using the Gemini API and tldraw’s canvas SDK. The tldraw team is launching computer with Gemini 1.5 Flash soon (join the waitlist) and is currently prototyping with Gemini 2.0 Flash for future iterations.
tldraw is using the Gemini API to bring the power of conversational AI to visual programming, allowing users to generate content and process information using natural language. This opens up exciting opportunities for more intuitive and efficient user experience around AI, pushing the boundaries of visual communication.
The Vision Behind Computer
tldraw, striving to make diagramming accessible and intuitive, envisioned a more natural way for users to interact with their canvas. Founder Steve Ruiz sought to leverage the power of tldraw’s infinite canvas SDK to create a dynamic environment for working with generative AI. This vision led to the development of computer, an experimental application where users create workflows from blocks of text, images, and instructions. When run, information flows from one component to the next, with the output of each generation serving as the input to the next, creating powerful processes that branch, loop, and iterate to produce outputs.
Building with Gemini 2.0: A Deep Dive into Computer
tldraw’s computer is built upon a network of interconnected “components” representing elements on the canvas (text boxes, images, audio clips, etc.). These components are linked by arrows, visualizing the flow of data and transformations. Each component has associated "procedures"—sets of instructions executed based on inputs from connected components. A component can accept data from any number of other components and pass its output data to many other components—including itself! This component-based architecture, combined with the power and speed of Gemini 2.0 Flash, allows for a fast and flexible system capable of handling diverse tasks.
Here's how Gemini 2.0 Flash prototyping has powered the experience:
Lightning-Fast Procedure Execution: Gemini 2.0 Flash executes procedures rapidly. For example, an "Instruction" component might contain "Write a short commercial." Within moments of being triggered, the component will have generated a re-usable script of steps that can turn any combination of inputs into a commercial script. The component will then use this script, together with its current inputs (e.g., a "Text" component with "New AI-powered smartgloves for cats"), to make a second prompt to the model for its final output. This output may be passed to another linked "Text" component for display, as well as other connected components, like "Speech" for text-to-speech, "Image" for visual generation, or other “Instruction” components for further transformation.
Lots of Context, Many Modes: The maximalist bent in tldraw’s computer called for speed, capacity, and capability. With multiple components providing data for each generation, Gemini 2.0 Flash’s large context window was critical for producing outputs that took all inputs into account, as was its support for images and files alongside written prompts.
Structured Data: The flow of data between components would not be possible without adherence to a single schema. The structured JSON output from Gemini 2.0 Flash ensures that each component in a workflow can recognize data of any type and produce its outputs in the same structure, preventing stalls, smoothing execution, and ensuring even large workflows will reliably complete.
Dynamic Procedure Generation: Beyond executing predefined procedures, Gemini 2.0 Flash can generate procedures dynamically. A user could input "create a marketing campaign based on this product description," and Gemini 2.0 Flash would generate the necessary steps (procedures) and the required components, building a workflow on the canvas based on the user's high-level request. This dynamic generation unlocks tremendous potential for innovative user experiences and streamlined workflows.
A Quick Win for Innovation
tldraw’s quick implementation of computer highlights Gemini’s value proposition for startups: rapid prototyping, enhanced user experience through intuitive natural language interfaces, and efficient structured data handling thanks to models like Gemini 2.0 Flash. This combination empowers small teams to create innovative, AI-powered features quickly and cost-effectively.
“We want to show that any team can build ambitious projects with tldraw’s canvas SDK. Gemini Flash was a perfect engine for a fast, multi-modal, canvas-based workflow tool. With Gemini 2.0 and perhaps a better name, I’m pretty sure we could pitch computer as its own startup tomorrow.”
Empower Your Application with the Gemini API
Inspired by tldraw's success? The Gemini API offers powerful models like Gemini 1.5 Pro, Gemini 1.5 Flash, and now Gemini 2.0 Flash as an experimental preview model to bring innovative AI features to your application. Explore the Gemini API documentation and empower your users with AI.
For creative professionals, developers, and teams of all kinds, tldraw offers a unique and powerful platform to bring ideas to life. Join the computer waitlist. Experience the future of visual collaboration today.