Scalable Distributed Computing Systems for Incremental Machine Learning in Big Data Applications
Abstract
Terabytes of daily raw data generated by IoT sensors are indispensable for investigating time-series problems like short-term forecasting of the target variable, and failure predictions. Pure batch learning algorithms can be challenging with this high frequency and high-volume data as concept drifts would require frequent retraining of the deployed models leading to significant downtimes. Therefore, incremental models or coupled batch-incremental models are gaining increasing importance to handle these problems. In this talk, we will present a distributed computing system that can scale to perform incremental learning for big data and efficiently perform a parameter search in big data applications to dynamically generate the most efficient incremental modelling pipelines with every stream of new incoming data, followed by synthetic and real world use cases.