Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams

Lexing Xie; Lyndon Kennedy; Shih-Fu Chang; Ajay Divakaran; Huifang Sun; Ching-Yung Lin

doi:10.1109/ICASSP.2005.1415589

ICASSP 2005

Conference paper

01 Dec 2005

Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams

View publication

Abstract

We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audiovisual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audio-visual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters. © 2005 IEEE.

Conference paper