Community detection in content-sharing social networks
Abstract
Network structure and content in microblogging sites like Twitter influence each other - user A on Twitter follows user B for the tweets that B posts on the network, and A may then re-tweet the content shared by B to his/her own followers. In this paper, we propose a probabilistic model to jointly model link communities and content topics by leveraging both the social graph and the content shared by users. We model a community as a distribution over users, use it as a source for topics of interest, and jointly infer both communities and topics using Gibbs sampling. While modeling communities using the social graph, or modeling topics using content have received a great deal of attention, a few recent approaches try to model topics in content-sharing platforms using both content and social graph. Our work differs from the existing generative models in that we explicitly model the social graph of users along with the user-generated content, mimicking how the two entities co-evolve in content-sharing platforms. Recent studies have found Twitter to be more of a content-sharing network and less a social network, and it seems hard to detect tightly knit communities from the follower-followee links. Still, the question of whether we can extract Twitter communities using both links and content is open. In this paper, we answer this question in the affirmative. Our model discovers coherent communities and topics, as evinced by qualitative results on sub-graphs of Twitter users. Furthermore, we evaluate our model on the task of predicting follower-followee links. We show that joint modeling of links and content significantly improves link prediction performance on a sub-graph of Twitter (consisting of about 0.7 million users and over 27 million tweets), compared to generative models based on only structure or only content and paths-based methods such as Katz. Copyright 2013 ACM.