Publication
ICASSP 2005
Conference paper

Blind change detection for audio segmentation

View publication

Abstract

Automatic segmentation of audio streams according to speaker identities, environmental and channel conditions has become an important preprocessing step for speech recognition, speaker recognition, and audio data mining. In most previous approaches, the automatic segmentation was evaluated in terms of the performance of the final system like the word error rate for speech recognition systems. In many applications like online audio indexing, and information retrieval systems, the actual boundaries of the segments are required. Therefore we present an approach based on the cumulative sum (CuSum) algorithm for automatic segmentation which minimizes the missing probability for a given false alarm rate. In this paper, we compare the CuSum algorithm to the Bayesian information criterion (BIC) algorithm, and a generalization of the Kolmogorov-Smirnov's test for automatic segmentation of audio streams. We present a two-step variation of the three algorithms which improves the performance significantly. We present also a novel approach that combines hypothesized boundaries from the three algorithms to achieve the final segmentation of the audio stream. Our experiments on the 1998 Hub4 broadcast news show that a variation of the CuSum algorithm significantly outperforms the other two approaches and that combining the three approaches using a voting scheme improves the performance slightly compared to using the a two-step variation of the CuSum algorithm alone. © 2005 IEEE.

Date

Publication

ICASSP 2005

Authors

Share