Betweenness centrality (BWC) is a fundamental centrality mea-sure in social network analysis. Given a large-scale network, how can we find the most central nodes? This question is of great importance to many key applications that rely on BWC, including community detection and understanding graph vulnerability. Despite the large amount of work on scalable approximation algorithm design for BWC, estimat-ing BWC on large-scale networks remains a computational challenge.

In this paper, we study the Centrality Maximization prob-lem (CMP): given a graph G = (V, E) and a positive integer k, find a set S∗ ⊆ V that maximizes BWC subject to the car-dinality constraint |S∗| ≤ k. We present an efficient randomized algorithm that provides a (1 − 1/e − )-approximation with high probability, where  > 0. Our results improve the current state-of-the-art result [40]. Furthermore, we provide the first theoretical evidence for the validity of a crucial assumption in betweenness centrality estimation, namely that
in real-world networks O(|V|2) shortest paths pass through the top-k central nodes, where k is a constant. This also explains why our algorithm runs in near linear time on real-world networks. We also show that our algorithm and anal-ysis can be applied to a wider range of centrality measures, by providing a general analytical framework.

On the experimental side, we perform an extensive experimental analysis of our method on real-world networks, demonstrate its accuracy and scalability, and study different properties of central nodes. Then, we compare the sampling method used by the state-of-the-art algorithm with our method. Furthermore, we perform a study of BWC in time evolving networks, and see how the centrality of the central nodes in the graphs changes over time. Finally, we compare the performance of the stochastic Kronecker model [28] to real data, and observe that it generates a similar growth pattern.

Filed under: Big Data | Graph Mining and Social Networks