Cross-domain robust acoustic training
Abstract
This paper describes our efforts towards cross-domain acoustic training for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We used weighted multi-style training by pooling insufficient telephony landline and cellular data with down sampled wide band clean data to develop better hybrid acoustic models. We explored the effects on decision tree size to accuracy by approximately 10%. The results show that by fixing number of parameters, system with smaller number of context dependent HMM states yields better accuracy. It leads to a smaller phone set design. We then investigated the performance degradation on two reduced phone sets for Spanish. Based on these studies, we are able to develop a hybrid system for 8KHz closing talking microphone, telephony landline and cellular phone environments. The acoustic model is evaluated on both flat grammars, digit and name at department, and language model tasks, ATIS and general dictation, using the IBM ViaVoice product engine.