Attribute-based people search in surveillance environments
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
The IBM Research/Columbia team investigated a novel range of low-level and high-level features and their combination for the TRECVID Multimedia Event Detection (MED) task. We submitted four runs exploring various methods of extraction, modeling and fusing of low-level features and hundreds of high-level semantic concepts. Our Run 1 developed event detection models utilizing Support Vector Machines (SVMs) trained from a large number of low-level features and was interesting in establishing the baseline performance for visual features from static video frames. Run 2 trained SVMs from classification scores generated by 780 visual, 113 action and 56 audio high-level semantic classifiers and explored various temporal aggregation techniques. Run 2 was interesting in assessing performance based on different kinds of high-level semantic information. Run 3 fused the lowand high-level feature information and was interesting in providing insight into the complementarity of this information for detecting events. Run 4 fused all of these methods and explored a novel Scene Alignment Model (SAM) algorithm that utilized temporal information discretized by scene changes in the video.
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Pavel Kisilev, Daniel Freedman, et al.
ICPR 2012
Sudeep Sarkar, Kim L. Boyer
Computer Vision and Image Understanding