Share

AUG 29, 2025

InstaLILY: An agentic enterprise search engine, powered by Gemini

Amit Shah

CEO & Co-Founder, Instalily.ai

Matt Ridenour

Head of Accelerator & Startup Ecosystem USA, Google

AgentOps showcase hero

Enterprise AI agents that automate complex workflows, like B2B sales or industrial maintenance, require models trained on vast amounts of high-quality, domain-specific data. For many companies, creating this data is a primary bottleneck, as manual labeling is slow and expensive, and generic models can lack the necessary nuance.

InstaLILY AI, an enterprise platform for autonomous and vertical AI agents, helps companies automate and run complex workflows in sales, service and operations. For one of their clients, PartsTown, they needed to build a real-time search engine for AI Agents to instantly match field service technicians with specific replacement parts from a catalog of over five million items. This required a scalable way to generate millions of high-quality labels for model training.

To solve this, InstaLILY AI developed a multi-stage synthetic data generation pipeline. The pipeline uses a teacher-student architecture, with Gemini 2.5 Pro acting as the “teacher” model to generate gold-standard training data, and a fine-tuned Gemma model as the “student” to enable scalable, low-cost production deployment.

The challenge of creating specialized training data at scale

The core of the parts search engine is a relevancy model that connects a service technician's query (e.g., "compressor for a Northland refrigerator") to the exact part number. Training this model required a massive dataset of query-part pairs.

InstaLILY AI faced several challenges with traditional methods:

  • Scalability: Manually labeling millions of work-order lines was not feasible.
  • Cost and quality: Using other frontier models for labeling was three times more expensive and resulted in 15% lower agreement rates compared to their final solution.
  • Performance: A live LLM-powered search would be too slow, with initial tests showing two-minute latency, and unable to handle the required 500+ queries per second (QPS) in production.


They needed a system that could cost-effectively generate high-quality data, leading to a fast and accurate final model.

A three-stage pipeline with Gemini and Gemma

InstaLILY AI engineered a three-stage pipeline that uses Gemini 2.5 Pro's advanced reasoning to create high-quality labels and then distills that knowledge into smaller, more efficient models for production.

The pipeline works as follows:

  • Synthetic data generation (teacher model): Gemini 2.5 Pro generates gold-standard labels for query-part pairs. To achieve high accuracy, InstaLILY AI uses multi-perspective chain-of-thought (Multi-CoT) reasoning, prompting the model to analyze parts from multiple angles, including brand, category, specifications, and complex business logic for compatibility. This approach achieved 94% agreement with human experts on a blind test set.
  • Student model training: The high-quality labels from Gemini 2.5 Pro are used to fine-tune Gemma-7B. InstaLILY AI used several techniques to optimize the student model, including Direct Preference Optimization (DPO), which reduced false positives by 40%. They also created an ensemble of three fine-tuned Gemma variants that vote on each sample, increasing label precision to 96%.
  • Production serving: The knowledge from the Gemma models is distilled into a lightweight BERT model (110M parameters) for the final production environment. This smaller model maintains 89% F1-score accuracy while serving requests at 600 QPS.


"Without LLM’s chain‑of‑thought labeling to bootstrap our distilled model, we’d be hand‑tagging an enormous amount of data," said the InstaLILY AI team. "Gemini significantly accelerated data preparation and allowed us to reallocate hundreds of engineering hours to higher leverage tasks like fine-tuning and orchestration.”

Reducing latency by 99.8% and costs by 98.3%

The teacher-student architecture delivered significant improvements in speed, cost, and accuracy.

The final system achieved:

  • Query latency reduction: From 2 minutes to 0.2 seconds (a 99.8% improvement).
  • Serving cost reduction: From $0.12 to $0.002 per 1,000 queries (a 98.3% reduction).
  • High accuracy: ~90% F1-score on a blind hold-out dataset.


The development process was also accelerated. The team built a prototype in 48 hours and a production-ready pipeline in four weeks—a process they estimate would have taken three to four months without the Gemini and Gemma ecosystem.

“Being part of the Google Accelerator unlocked this entire approach,” said Amit Shah, Founder & CEO of InstaLILY. “The hands-on technical support, early access to Gemini and Gemma, and generous Cloud credits helped us move from prototype to production in weeks—not months.”

Future development with multimodal and continuous learning

InstaLILY AI plans to expand the capabilities of its AI agents by incorporating Gemini’s multimodal features. This will allow technicians to upload a photo of a broken unit to aid in diagnosis. They are also developing a continuous active-learning service that flags low-confidence live queries, routes them to Gemini for annotation, and retrains the production models weekly.

The success of InstaLILY AI's search engine for their AI Agents demonstrates how a teacher-student architecture, combining the reasoning power of Gemini 2.5 Pro with the efficiency of fine-tuned Gemma models, can solve complex data generation challenges and enable high-performance, scalable AI applications.

To start building with Gemini and Gemma models, read our API documentation.