Suchana Datta, Debasis Ganguly, et al.
FIRE 2020
Effective clustering of short documents, such as tweets, is difficult because of the lack of sufficient semantic context. Word embedding is a technique that is effective in addressing this lack of semantic context. However, the process of word vector embedding, in turn, relies on the availability of sufficient contexts to learn the word associations. To get around this problem, we propose a novel word vector training approach that leverages topically similar tweets to better learn the word associations. We test our proposed word embedding approach by clustering a collection of tweets on disasters. We observe that the proposed method improves clustering effectiveness by up to 14%.
Suchana Datta, Debasis Ganguly, et al.
FIRE 2020
Debasis Ganguly, Haithem Afli, et al.
FIRE 2018
Yufang Hou, Charles Jochim, et al.
ACL 2019
Procheta Sen, Debasis Ganguly, et al.
SIGIR 2020