Supervised selection of prototypes for classification
Abstract
Given sufficient samples of data tagged with their class identities, three techniques for constructing supervised prototypes to represent these classes are examined. The first method consists of averaging the tokens of each class separately to obtain the prototypes. In the second approach, several tokens, picked uniformly from each class, are designated as prototypes. The third technique involves a systematic search procedure to select effective prototypes and discard obsolete ones. Approximately two hours of continuous speech data from each of two speakers were used for experimentation. Each centisecond frame of speech was labeled with one of 200 phonetic subunit names utilizing hidden Markov model training and Viterbi alignment procedures. Prototypes were determined from the first part of the data, whereas the last part served to measure the classification performance. Average accuracies ranged from 24.2% with 200 prototypes in the first, to 31.5% with 32,000 prototypes in the second, to 38.5% with 2258 prototypes in the third method.