P.S. Gopalakrishnan, D. Kanevsky, et al.
ICASSP 1989
The performance of a large vocabulary speech recognition system is critically tied to the quality of the acoustic prototypes that are established in the relevant feature space(s). This is especially true when only a limited amount of training data is available to extract information about pronunciation variability. To better account for co-articulation effects, we describe a supervised strategy for the construction of context-sensitive acoustic prototypes. The idea is to incorporate contextual supervision to relate the allophonic models to their acoustic manifestations. This makes for a better utilization of the available training data, while at the same time allowing for a short design time turn around. The performance of this method is illustrated on an isolated utterance speech recognition task with a vocabulary of 20,000 words.
P.S. Gopalakrishnan, D. Kanevsky, et al.
ICASSP 1989
L.R. Bahl, S. De Gennaro, et al.
ICSLP 1998
Jerome R. Bellegarda, Edward L. Titlebaum
IEEE Transactions on Aerospace and Electronic Systems
L.R. Bahl, P.S. Gopalakrishnan, et al.
ICASSP 1989