On k-Means cluster preservation using quantization schemes
Abstract
This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-Means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets. © 2009 IEEE.