Contextual Word Embedding: A Case Study in Clustering Tweets about Emergency Situations

Debasis Ganguly; Kripabandhu Ghosh

doi:10.1145/3184558.3186935

WWW 2018

Conference paper

23 Apr 2018

Contextual Word Embedding: A Case Study in Clustering Tweets about Emergency Situations

View publication

Abstract

Effective clustering of short documents, such as tweets, is difficult because of the lack of sufficient semantic context. Word embedding is a technique that is effective in addressing this lack of semantic context. However, the process of word vector embedding, in turn, relies on the availability of sufficient contexts to learn the word associations. To get around this problem, we propose a novel word vector training approach that leverages topically similar tweets to better learn the word associations. We test our proposed word embedding approach by clustering a collection of tweets on disasters. We observe that the proposed method improves clustering effectiveness by up to 14%.

Conference paper