Joel L. Wolf, Mark S. Squillante, et al.
IEEE Transactions on Knowledge and Data Engineering
Clustering aims to find the intrinsic structure of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning as no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some major shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into cluster and empty (sparse) regions at different levels of details. The technique is able to find "natural" clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides comprehensible descriptions of clusters. Experiment results on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets.
Joel L. Wolf, Mark S. Squillante, et al.
IEEE Transactions on Knowledge and Data Engineering
Junyi Xie, Jun Yang, et al.
ICDE 2008
Douglas W. Cornell, Daniel M. Dias, et al.
IEEE Transactions on Software Engineering
Bruno Ciciani, Daniel M. Dias, et al.
IEEE Transactions on Software Engineering