企業 AI 代理程式會自動執行複雜的工作流程 (例如 B2B 銷售或工業維護),因此需要以大量高品質的特定領域資料訓練模型。對許多公司來說,建立這類資料是主要瓶頸,因為手動標註既緩慢又昂貴,而通用模型可能缺乏必要的細微差異。
InstaLILY AI 是企業平台,可提供自主式垂直 AI 代理程式,協助企業自動執行及管理銷售、服務和營運方面的複雜工作流程。其中一個客戶 PartsTown 需要建構 AI 代理程式的即時搜尋引擎,以便從超過五百萬項目的目錄中,立即為現場服務技術人員比對特定更換零件。因此需要可擴充的方式,為模型訓練產生數百萬個高品質標籤。
為解決這個問題,InstaLILY AI 開發了多階段合成資料生成管道。這個管道採用師生架構,由 Gemini 2.5 Pro 擔任「老師」模型,生成黃金標準訓練資料,並由微調後的 Gemma 模型擔任「學生」,實現可擴充的低成本生產部署。
InstaLILY AI 設計了三階段管道,運用 Gemini 2.5 Pro 的進階推論功能建立高品質標籤,然後將這些知識蒸餾到較小且更有效率的模型中,用於製作。
管道運作方式如下:
生成合成資料 (老師模型):Gemini 2.5 Pro 會為查詢部分配對生成黃金標準標籤。為確保高準確度,InstaLILY AI 會使用多角度的連鎖思維 (Multi-CoT) 推理,提示模型從多個角度分析零件,包括品牌、類別、規格和複雜的相容性商業邏輯。在盲測集上,這個方法與人類專家的意見一致率達到 94%。
學生模型訓練:使用 Gemini 2.5 Pro 的高品質標籤微調 Gemma-7B。InstaLILY AI 採用多種技術來最佳化學生模型,包括直接偏好最佳化 (DPO),可減少 40% 的誤判。他們也建立了一組微調後的 Gemma 變體,針對每個樣本進行投票,將標籤精確度提升至 96%。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[],[],null,["[](/showcase) \nShare\nAUG 29, 2025 \n\nInstaLILY: An agentic enterprise search engine, powered by Gemini \nAmit Shah\n\nCEO \\& Co-Founder, Instalily.ai \nMatt Ridenour\n\nHead of Accelerator \\& Startup Ecosystem USA, Google \n\nEnterprise AI agents that automate complex workflows, like B2B sales or industrial maintenance, require models trained on vast amounts of high-quality, domain-specific data. For many companies, creating this data is a primary bottleneck, as manual labeling is slow and expensive, and generic models can lack the necessary nuance.\n\n\n\u003cbr /\u003e\n\n\n[InstaLILY AI](https://instalily.ai/), an enterprise platform for autonomous and vertical AI agents, helps companies automate and run complex workflows in sales, service and operations. For one of their clients, PartsTown, they needed to build a real-time search engine for AI Agents to instantly match field service technicians with specific replacement parts from a catalog of over five million items. This required a scalable way to generate millions of high-quality labels for model training. \n\n\u003cbr /\u003e\n\n\nTo solve this, InstaLILY AI developed a multi-stage synthetic data generation pipeline. The pipeline uses a teacher-student architecture, with Gemini 2.5 Pro acting as the \"teacher\" model to generate gold-standard training data, and a fine-tuned Gemma model as the \"student\" to enable scalable, low-cost production deployment.\n\nThe challenge of creating specialized training data at scale \n\nThe core of the parts search engine is a relevancy model that connects a service technician's query (e.g., \"compressor for a Northland refrigerator\") to the exact part number. Training this model required a massive dataset of query-part pairs.\n\n\n\u003cbr /\u003e\n\n\nInstaLILY AI faced several challenges with traditional methods:\n\n- **Scalability:** Manually labeling millions of work-order lines was not feasible.\n- **Cost and quality:** Using other frontier models for labeling was three times more expensive and resulted in 15% lower agreement rates compared to their final solution.\n- **Performance:** A live LLM-powered search would be too slow, with initial tests showing two-minute latency, and unable to handle the required 500+ queries per second (QPS) in production.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThey needed a system that could cost-effectively generate high-quality data, leading to a fast and accurate final model.\n\n\u003cbr /\u003e\n\nvideo.title\n\nA three-stage pipeline with Gemini and Gemma \n\nInstaLILY AI engineered a three-stage pipeline that uses Gemini 2.5 Pro's advanced reasoning to create high-quality labels and then distills that knowledge into smaller, more efficient models for production.\n\n\n\u003cbr /\u003e\n\n\nThe pipeline works as follows:\n\n- **Synthetic data generation (teacher model):** Gemini 2.5 Pro generates gold-standard labels for query-part pairs. To achieve high accuracy, InstaLILY AI uses multi-perspective chain-of-thought (Multi-CoT) reasoning, prompting the model to analyze parts from multiple angles, including brand, category, specifications, and complex business logic for compatibility. This approach achieved 94% agreement with human experts on a blind test set.\n- **Student model training:** The high-quality labels from Gemini 2.5 Pro are used to fine-tune Gemma-7B. InstaLILY AI used several techniques to optimize the student model, including Direct Preference Optimization (DPO), which reduced false positives by 40%. They also created an ensemble of three fine-tuned Gemma variants that vote on each sample, increasing label precision to 96%.\n- **Production serving:** The knowledge from the Gemma models is distilled into a lightweight BERT model (110M parameters) for the final production environment. This smaller model maintains 89% F1-score accuracy while serving requests at 600 QPS.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n**\"Without LLM's chain‑of‑thought labeling to bootstrap our distilled model, we'd be hand‑tagging an enormous amount of data,\"** said the InstaLILY AI team. **\"Gemini significantly accelerated data preparation and allowed us to reallocate hundreds of engineering hours to higher leverage tasks like fine-tuning and orchestration.\"**\n\n\u003cbr /\u003e\n\nReducing latency by 99.8% and costs by 98.3% \n\nThe teacher-student architecture delivered significant improvements in speed, cost, and accuracy.\n\n\n\u003cbr /\u003e\n\n\nThe final system achieved:\n\n- **Query latency reduction:** From 2 minutes to 0.2 seconds (a 99.8% improvement).\n- **Serving cost reduction:** From $0.12 to $0.002 per 1,000 queries (a 98.3% reduction).\n- **High accuracy:** \\~90% F1-score on a blind hold-out dataset.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThe development process was also accelerated. The team built a prototype in 48 hours and a production-ready pipeline in four weeks---a process they estimate would have taken three to four months without the Gemini and Gemma ecosystem. \n\n\"Being part of the [Google Accelerator](https://startup.google.com/programs/accelerator/) unlocked this entire approach,\" said Amit Shah, Founder \\& CEO of InstaLILY. \"The hands-on technical support, early access to Gemini and Gemma, and generous Cloud credits helped us move from prototype to production in weeks---not months.\"\n\n\u003cbr /\u003e\n\nFuture development with multimodal and continuous learning \n\nInstaLILY AI plans to expand the capabilities of its AI agents by incorporating Gemini's multimodal features. This will allow technicians to upload a photo of a broken unit to aid in diagnosis. They are also developing a continuous active-learning service that flags low-confidence live queries, routes them to Gemini for annotation, and retrains the production models weekly.\n\n\n\u003cbr /\u003e\n\n\nThe success of InstaLILY AI's search engine for their AI Agents demonstrates how a teacher-student architecture, combining the reasoning power of Gemini 2.5 Pro with the efficiency of fine-tuned Gemma models, can solve complex data generation challenges and enable high-performance, scalable AI applications.\n\n\n\u003cbr /\u003e\n\n\nTo start building with Gemini and Gemma models, read our [API documentation](https://ai.google.dev/gemini-api/docs). \n\nRelated case studies \n[Passionfroot\nPassionfroot uses AI to help brands run creator marketing campaigns by automating manual tasks and providing tools for seamless collaboration.](/showcase/passionfroot) [Vela Partners\nVela Partners uses Grounding with Google Search for Deeper, Faster Insights](/showcase/vela) [Wolf Games\nWolf Games uses Gemini API to boost content generation accuracy to 96% and slash latency to under 20 seconds for their daily crime stories.](/showcase/wolfgames)"]]