Centralities in large networks: Algorithms and observations
Abstract
Node centrality measures are important in a large number of graph applications, from search and ranking to social and biological network analysis. In this paper we study node centrality for very large graphs, up to billions of nodes and edges. Various definitions for centrality have been proposed, ranging from very simple (e.g., node degree) to more elaborate. However, measuring centrality in billion-scale graphs poses several challenges. Many of the "traditional" definitions such as closeness and betweenness were not designed with scalability in mind. Therefore, it is very difficult, if not impossible, to compute them both accurately and efficiently. In this paper, we propose centrality measures suitable for very large graphs, as well as scalable methods to effectively compute them. More specifically, we propose effective closeness and LINERANK which are designed for billion-scale graphs. We also develop algorithms to compute the proposed centrality measures in MAPREDUCE, a modern paradigm for large-scale, distributed data processing. We present extensive experimental results on both synthetic and real datasets, which demonstrate the scalability of our approach to very large graphs, as well as interesting findings and anomalies. Copyright © SIAM.