Attribute-based people search in surveillance environments
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
For Multimedia Retrieval to be effective, the semantic gap needs to be bridged. Statistical learning techniques provide a robust framework for learning representations of semantic concepts from visual features. The bottleneck is the need to annotate a large number of training samples to construct robust models. We present a novel approach where the annotations may be entered at coarser spatial granularity while the concept may still be learnt at finer granularity. This can speed up annotation significantly and provide bootstrapping. We show that it is possible to learn representations of concepts occurring at the regional level by using annotations for several images, where the annotations are provided only at the global level. The disambiguation can be handled by the multiple instance learning paradigm. We demonstrate this using the TREC 2001 Corpus for the concept Sky.
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Pavel Kisilev, Daniel Freedman, et al.
ICPR 2012
Sudeep Sarkar, Kim L. Boyer
Computer Vision and Image Understanding