SPC: A distributed, scalable platform for data mining
Abstract
The Stream Processing Core (SPC) is distributed stream processing middleware designed to support applications that extract information from a large number of digital data streams. In this paper, we describe the SPC programming model which, to the best of our knowledge, is the first to support stream-mining applications using a subscription-like model for specifying stream connections as well as to provide support for non-relational operators. This enables stream-mining applications to tap into, analyze and track an ever-changing array of data streams which may contain information relevant to the streaming-queries placed on it. We describe the design, implementation, and experimental evaluation of the SPC distributed middleware, which deploys applications on to the running system in an incremental fashion, making stream connections as required. Using micro-benchmarks and a representative large-scale synthetic stream-mining application, we evaluate the performance of the control and data paths of the SPC middleware. Copyright 2006 ACM.