Unfolded recurrent neural networks for speech recognition
George Saon, Hagen Soltau, et al.
INTERSPEECH 2014
We review the performance of a new two-stage cascaded machine learning approach for rescoring keyword search output for low resource languages. In the first stage Confusion Networks (CNs) are rescored for improved Automatic Speech Recognition (ASR) by reranking the arcs of each confusion bin. In the second stage we generate keyword search hypotheses from the rescored ASR output and rescore them using logistic regression classifiers to detect true hits and false alarms. We compare the performance of our system with state of the art rescoring techniques, including probability of false alarm normalization, exponential normalization, rank-normalized posterior scores and sum-to-one normalization and show promising results. Experimental validation is performed using the Term Weighted Value (TWV) metric on four corpora from the IARPA-Babel program for keyword search on low resource languages, including Assamese, Bengali, Lao and Zulu.
George Saon, Hagen Soltau, et al.
INTERSPEECH 2014
Sören Bleikertz, Carsten Vogel, et al.
ACSAC 2014
Shang-Ling Hsu, Raj Sanjay Shah, et al.
Proceedings of the ACM on Human Computer Interaction
Robert Moore, Eric Young Liu, et al.
CUI 2020