A multimodal speaker detection and tracking system for teleconferencing

Billibon H. Yoshimi; Gopal S. Pingali

doi:10.1145/641007.641100

MM 2002

Conference paper

01 Dec 2002

A multimodal speaker detection and tracking system for teleconferencing

View publication

Abstract

A serious problem in both audio and video conferencing facilities available today is the difficulty in determining who is speaking among a large number of participants. There is a strong need for developing meeting room infrastructure and teleconference facilities that improve the sense of presence and participation experienced in remote meetings. We present a distributed multimodal tracking system that uses multiple cameras and microphones to automatically select the current speaker among multiple meeting participants. The system actively obtains and transmits video showing a good view of the selected speaker. The tracking system is integrated into a web-based video conferencing application that connects seven meeting rooms around the globe. An important part of designing such a system is to determine sensor placement and configuration through systematic experiments in the actual rooms where the system is deployed.

Talk