Publication
INTERSPEECH 2013
Conference paper

The IBM speech activity detection system for the DARPA RATS program

Abstract

We present the IBM speech activity detection system that was fielded in the phase 2 evaluation of the DARPA RATS (robust automatic transcription of speech) program. Key ingredients of the system are: multi-pass HMM Viterbi segmentation, fusion of multiple feature streams, file-based and speech-based normalization schemes, the use of regular and convolutional deep neural networks, and model fusion through frame-level score combination of channel-dependent models. These techniques were instrumental in achieving a 1.4% equal error rate on the RATS phase 2 evaluation data. Copyright © 2013 ISCA.

Date

Publication

INTERSPEECH 2013