The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space
Abstract
The similarity search and indexing problem is well known to be a difficult one for high dimensional applications. Most indexing structures show a rapid degradation with increasing dimensionality which leads to an access of the entire database for each query. Furthermore, recent research results show that in high dimensional space, even the concept of similarity may not be very meaningful. In this paper, we propose the IGrid-index; a method for similarity indexing which uses a distance function whose meaningfulness is retained with increasing dimensionality. In addition, this technique shows performance which is unique to all known index structures; the percentage of data accessed is inversely proportional to the overall data dimensionality. Thus, this technique relies on the dimensionality to be high in order to provide performance efficient similarity results. The IGrid-index can also support a special kind of query which we refer to as projected range queries; a query which is increasingly relevant for very high dimensional data mining applications.