Publication
ICPR 2004
Conference paper

Order-preserving clustering and its application to gene expression data

View publication

Abstract

Clustering of ordered data sets is a common problem faced in many pattern recognition tasks. Existing clustering methods either fail to capture the data or use restrictive models such as HMMs or AR models to model the data. In this paper, we present a general order-preserving clustering algorithm that allows arbitrary patterns of data evolution by representing each ordered set as a curve. Clustering of the data then reduces to grouping curves based on shape similarity. We develop a novel measure of shape similarity between curves using scale-space distance. Shape similarity or dis-similarity is judged by composing higher-dimensional curves from constituent curves and noting the additional twists and turns in such curves that can be attributed to shape differences. An algorithm analogous to K-means clustering is then developed that uses prototypical curves for representing clusters. Results are demonstrated on ordered gene expression data sets obtained from gene chips.

Date

Publication

ICPR 2004

Authors

Topics

Share