D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes
Alessio Conte (National Institute of Informatics, Japan); Tiziano De Matteis (University of Pisa); Daniele De Sensi (University of Pisa); Roberto Grossi (Universita' di Pisa); Andrea Marino (Universita' di Pisa); Luca Versari (Universita' di Pisa)
This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today’s real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.