Conference paper
Conference paper
Text-Guided Few-Shot Semantic Segmentation with Training-Free Multimodal Feature Matching
Abstract
This paper addresses few-shot semantic segmentation (FSS) guided by text, where we classify unseen novel classes using image and text references as in-context examples, without the need for training. We enhance the quality and stability of the segmentation masks generated by FSS by combining the capability of open-vocabulary zero-shot semantic segmentation (ZSS) based on foundation models for image and text. We propose a training-free approach using multimodal feature matching that performs segmentation by identifying regions in a target image that match the features from both the image and text references. Experimental results demonstrate that the proposed method outperforms state-of-the-art FSS and ZSS methods.
Related
Conference paper
Do not have enough data? Deep learning to the rescue!
Conference paper