A statistical modeling approach to content based video retrieval
Milind R. Naphade, Sankar Basu, et al.
ICPR 2008
A new technique for constructing Markov models for the acoustic representation of words is described. Word models are constructed from models of sub-word units called fenones. Fenones represent very short speech events, and are obtained automatically through the use of a vector quantizer. The fenonic baseform for a word—i.e., the sequence of fenones used to represent the word—is derived automatically from one or more utterances of that word. Since the word models are all composed from a small inventory of sub-word models, training for large-vocabulary speech recognition systems can be accomplished with a small training script. A method for combining phonetic and fenonic models is presented. Results of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks are reported. Comparative results with phonetics-based Markov models and template-based DP matching are also given. © 1993 IEEE
Milind R. Naphade, Sankar Basu, et al.
ICPR 2008
Xiaohui Shen, Gang Hua, et al.
FG 2011
Simona Rabinovici-Cohen, Naomi Fridman, et al.
Cancers
Lalit R Bahl, Steven V. De Gennaro, et al.
IEEE Transactions on Speech and Audio Processing