Discriminative reranking for LVCSR leveraging invariant structure
Abstract
An invariant structure is one of the long-span acoustic representations, where acoustic variations caused by non-linguistic factors are effectively removed from speech. We present in this paper a new method to leverage the invariant structures as features of discriminative reranking for Large Vocabulary Continuous Speech Recognition (LVCSR). First we use a traditional HMM-based LVCSR system to get a list of N-best candidates with phone alignments and construct an invariant structure for each candidate using its phone alignment. Here, the invariant structure is composed of lengths between every two phonemes in the candidate. Then we estimate a score of each phoneme-pair in the invariant structure, and rerank the N-best candidates using a weighted sum of the phoneme-pair scores, where the weights are trained discriminatively by averaged perceptron. Experi-mental results show a relative CER improvement of 6.69% over the baseline HMM-based LVCSR system.