Publication
CIKM 2005
Conference paper
Taxonomies by the numbers: Building high-performance taxonomies
Abstract
In this paper, we describe a system for the construction of taxonomies which yield high accuracies with automated categorization systems, even on Web and intranet documents. In particular, we describe the way in which measurement of five key features of the system can be used to predict when categories are sufficiently well defined to yield high accuracy categorization. We describe the use of this system to construct a large (8800-category) general-purpose taxonomy and categorization system. Copyright 2005 ACM.