Share

OCT 20, 2025

Firecrawl uses Gemini 2.5 Pro to structure web data for AI applications

Eric Ciarla

Co-Founder

Vishal Dharmadhikari

Product Solutions Engineer

Firecrawl showcase hero

AI applications, such as retrieval-augmented generation (RAG) systems and autonomous agents, increasingly require access to live, real-world information from the web. However, web content is often unstructured, dynamic, and inconsistent, making reliable data extraction a significant challenge for developers.

Firecrawl, an AI-first web data platform, provides APIs that enable developers and AI systems to programmatically find, fetch, parse, and structure web data at scale. They abstract the complexity of traditional web scraping, transforming unstructured web content into clean, usable data.

To achieve this, Firecrawl uses Gemini 2.5 Pro to power its core extraction engine. Gemini models provide the advanced language understanding and reasoning capabilities necessary to accurately parse diverse and irregular web content.

Turning the unstructured web into usable data

Firecrawl aims to make the entire web accessible for AI systems. Traditional rule-based web scraping methods are often brittle and require constant maintenance because website structures change frequently. Firecrawl needed a solution capable of understanding context and extracting data reliably, even from highly variable sources.

Firecrawl developed two core products using Gemini 2.5 Pro:

  • SmartScrape: An extraction tool that uses Gemini 2.5 Pro’s language understanding and reasoning capabilities to transform raw HTML into structured outputs, such as JSON or key-value pairs. It performs context-aware extraction, understanding the meaning of data relative to user-specified goals, rather than just its location on a page.
  • FIRE-1: An experimental agent framework that uses Gemini 2.5 Pro to interpret user intent, navigate web content, and generate outputs based on live web data.


Before adopting Gemini 2.5 Pro, Firecrawl evaluated several leading models. They found that other models struggled to handle the complexity and variability of real-world web content at production scale.

"Gemini 2.5 Pro made the entire project feasible," said Eric Ciarla, Co-founder of Firecrawl. "Before using Gemini 2.5 Pro, the models we tested couldn’t reliably handle the level of complexity required to extract and reason over real-world web content. Gemini 2.5 Pro’s reasoning capabilities, accuracy, and stability enabled us to move forward with confidence."

Implementing Gemini 2.5 Pro with tool calling

Firecrawl integrated Gemini 2.5 Pro into their products in approximately one week. They leverage the model’s reasoning and tool calling capabilities within their agent architecture.

In the FIRE-1 agent framework, the model operates within an agent loop that combines Gemini 2.5 Pro’s reasoning with deterministic control flows. The process works as follows:

  • Input: The agent receives the webpage Document Object Model (DOM) and a defined user goal (e.g., "get me all the pages on this website").
  • Reasoning: Gemini 2.5 Pro analyzes the inputs and determines the necessary actions.
  • Execution: The model executes these actions via tool calling (function calls). For navigation tasks, the agent might autonomously invoke functions like click(next_page) to retrieve the required data.


This approach allows Firecrawl to handle complex web navigation and extraction tasks that require both flexibility and predictability.

Achieving 98% extraction accuracy

In internal benchmarks comparing extraction accuracy and complex web parsing, Gemini 2.5 Pro significantly outperformed other models Firecrawl evaluated.

Gemini 2.5 Pro achieved 98% accuracy in Firecrawl's internal evaluations. The next-best model tested reached approximately 80% accuracy. This performance increase translated directly into higher-quality extraction outputs and more reliable agent behavior in production workloads.

"In our internal testing, Gemini 2.5 Pro consistently outperformed the alternatives across every key dimension for our use case: extraction accuracy, complex reasoning, latency, and overall throughput," Ciarla noted.

Building the future of web interaction

Gemini models are now a foundational component of Firecrawl’s AI infrastructure, enabling them to provide reliable web data pipelines for AI products.

Firecrawl is currently evaluating Gemini 2.5 Flash for use cases requiring ultra-low latency, where real-time agentic interaction is critical. As the Gemini model family evolves, Firecrawl plans to integrate new capabilities to further improve how AI agents interact with real-world web data.

To start building your own applications, explore the capabilities of Gemini models in our API documentation.