Saurabh Paul, Christos Boutsidis, et al.
JMLR
Agentic AI is poised to redefine intelligent decision-making in Industry 4.0 by enabling autonomous systems to perceive, reason, and act over multimodal data sources such as sensor streams, structured knowledge bases, and unstructured maintenance logs. Despite the rapid progress of large language models and multimodal learning, building deployable and trustworthy agentic systems remains challenging due to fragmented data silos, integration complexity, lack of explainability, and limited evaluation protocols. This tutorial offers a comprehensive, hands-on walkthrough of the full lifecycle of multimodal agentic AI systems—from inception to productization. Participants will begin by learning foundational agent architectures and reasoning strategies, including Plan-Execute, ReAct, Reflexion, RAFA, and hybrid symbolic-neural approaches. Building on these concepts, the tutorial features two interactive lab sessions: (i) addressing integration challenges in smart manufacturing using an open-source multi-agent environment, and (ii) benchmarking agent performance, reasoning, and explainability in an enterprise-scale industrial simulation. Along the way, participants will engage with state-of-the-art tools and workflows for data integration, trace alignment, and failure diagnosis. They will explore evaluation techniques that combine human feedback, sensor-grounded truth, and emerging LLM-as-a-judge methods to assess reliability and trustworthiness. Additional emphasis will be placed on governance and monitoring frameworks, as well as visualizations for traceability and introspection, enabling participants to design explainable and auditable deployments. The tutorial leverages two open-source environments AssetOpsBench, a simulation platform with 141 curated industrial scenarios, and SmartPilot, a smart manufacturing multi-agent system for multimodal data integration and reasoning, providing participants with realistic data and experimental workflows. Together, these environments demonstrate how to evaluate agents under diverse operating conditions, such as anomaly detection, fault progression, and post-maintenance recovery.
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Fabio Lorenzi, Abigail Langbridge, et al.
AAAI 2026
Joxan Jaffar
Journal of the ACM
Cristina Cornelio, Judy Goldsmith, et al.
JAIR