Improved voice activity detection using static harmonic features

Takashi Fukuda; Osamu Ichikawa; Masafumi Nishimura

doi:10.1109/ICASSP.2010.5495598

ICASSP 2010

Conference paper

14 Mar 2010

Improved voice activity detection using static harmonic features

View publication

Abstract

Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. We have proposed a statistical-model-based VAD using the long-term temporal information in speech, which shows good robustness against noise in an automobile environment. For further improvement, this paper describes a new method to exploit harmonic structure information with statistical models. In our approach, local peaks considered to be harmonic structures are extracted, without explicit pitch detection and voiced-unvoiced classification. The proposed method including both long-term temporal and static harmonic features led to considerable improvements under low SNR conditions in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system. ©2010 IEEE.

Paper