Mining significant graph patterns by leap search
Xifeng Yan, Hong Cheng, et al.
SIGMOD 2008
High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be meaningful. We discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter-attribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable, and are likely to tradeoff with better accuracy.
Xifeng Yan, Hong Cheng, et al.
SIGMOD 2008
Claudia Canali, Valeria Cardellini, et al.
SAINT 2005
Avraham Leff, Philip S. Yu
ICDCS 1992
Xiaohui Gu, Philip S. Yu, et al.
ICDE 2007