Attribute-based people search in surveillance environments
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable visual segmentation model optimized for edge and in-sensor execution, including the Sony IMX500. It builds on a depthwise separable U-Net, with knowledge distillation and fixed-point prompt encoding to learn from the Segment Anything Model 2 (SAM2). On COCO and LVIS, it achieves 51.9% and 44.9% mIoU, respectively. The quantized model (1.22MB) runs at 14.3ms on the IMX500— achieving ~86 MACs/cycle making it the only model meeting both memory and compute constraints for in-sensor deployment. Distillation boosts LVIS performance by +3.5% mIoU and +5.1% mAP. These results demonstrate that efficient, promptable segmentation is feasible directly on-camera, enabling privacy-preserving vision without cloud or host processing.
Daniel A. Vaquero, Rogerio S. Feris, et al.
WACV 2009
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Pavel Kisilev, Daniel Freedman, et al.
ICPR 2012
Michelle X. Zhou, Fei Wang, et al.
ICMEW 2013