KDD Papers

Graph Edge Partitioning via Neighborhood Heuristic

Chenzi Zhang (the University of Hong Kong);Fan Wei (Stanford University);Qin Liu (Huawei Noah’s Ark Lab);Zhihao Gavin Tang (the University of Hong Kong);Zhenguo Li (Huawei Noah’s Ark Lab)


Abstract

We consider the edge partitioning problem that partitions the edges of an input graph into multiple balanced components, while minimizing the total number of vertices replicated (one vertex might appear in more than one partition). This problem is critical in minimiz- ing communication costs and running time for several large-scale distributed graph computation platforms (e.g., PowerGraph, Spark GraphX). We first prove that this problem is NP-hard, and then present a new partitioning heuristic with polynomial running time. We provide a worst-case upper bound of replication factor for our heuristic on general graphs. To our knowledge, we are the first to provide such bound for edge partitioning algorithms on general graphs. Applying this bound to random power-law graphs greatly improves the previous bounds of expected replication factor. Ex- tensive experiments demonstrated that our partitioning algorithm consistently produces much smaller replication factors on various benchmark data sets than the state-of-the-art. When deployed in the production graph engine, PowerGraph, in average it reduces replication factor, communication, and running time by 54%, 66%, and 21%, respectively. Source code available at https://www.dropbox.com/s/xqceetbwdxhabqp/edgepart.zip?dl=0


Comments