Auto-Q : Automated Domain Questions Generation for Industrial Assets
Abstract
Industrial data scientists are expected to collaborate closely with subject matter experts to solve business problems. One of their core tasks is to build domain understanding by asking questions about a given industrial asset or process. Asking a question is an art, and asking it at the right time is a skill. This paper aims to leverage the generative AI capabilities of large language models (LLMs) to help data scientists navigate the realm of domain-specific, problem-centric question generation.
We propose a system called , which automatically generates domain-specific questions for industrial systems. At the core of is , a multi-round process involving a multi-agent (currently five) interaction framework that generates instruction-response pairs for building synthetic questions using a mixture of zero-shot, in-context, chain-of-question, and other techniques. The agents communicate partially through predefined prompts and partially through recommendations generated by large language models (LLMs). The multi-round process also addresses concerns raised by the community about the decline of linguistic diversity in LLMs trained on synthetic text. We continuously monitor the set of newly generated instruction sets at the end of each round.
We empirically demonstrated that our proposed system combines art and skill, two key ingredients to improve coordination and communication across multiple personas. We deployed a lightweight, any-time, dataset-free, annotation-free, and AI principle-driven solution available to the internal community for feedback.