This paper proposes a new method for performing joint audio-video talker localization that explores the reliability of the individual localization estimates such as audio, motion detection, and skin-color detection. The reliability information is estimated from the audio and video data separately. The proposed method then uses this reliability information in conjunction with a simple summing voter to dynamically discriminate erroneous outputs from the localizers while performing fusion on the localization results. Based on the voter output, a majority rule is then used to make the final decision of the active talker's current location. The results show that adding the reliability information during fusion improves localization performance when compared to audio only, motion detection only, skin-color detection only, and joint audio-video using straight summing fusion localization methods. The computational complexity of the proposed method is comparable to the existing ones.

Additional Metadata
Persistent URL dx.doi.org/10.1109/TIM.2004.831181
Journal IEEE Transactions on Instrumentation and Measurement
Citation
Lo, D. (David), Goubran, R, Dansereau, R, Thompson, G. (Graham), & Schulz, D. (Dieter). (2004). Robust joint audio-video localization in video conferencing using reliability information. IEEE Transactions on Instrumentation and Measurement, 53(4), 1132–1139. doi:10.1109/TIM.2004.831181