Multi-concept learning with large-scale multimedia lexicons
Abstract
Multi-concept learning is an important problem in multimedia content analysis and retrieval. It connects two key components in the multimedia semantic ecosystem: multimedia lexicon and semantic concept detection. This paper aims to answer two questions related to multi-concept learning: does a large-scale lexicon help concept detection? how many concepts are enough? Our study on a largescale lexicon shows that more concepts indeed help improve detection performance. The gain is statistically significant with more than 40 concepts and saturates at over 200. We also compared a few different modeling choices for multi-concept detection: generative models such as Naive Bayes performs robustly across lexicon choices and sizes, discriminative models such as logistic regression and SVM performs comparably on specially selected concept sets, yet tend to over-fit on large lexicons. © 2008 IEEE.