The all-distances sketch (ADS) has recently emerged as a promising paradigm of graph neighborhood sketching. An ADS is a probabilistic data structure that is defined for each vertex of a graph. ADSs facilitate accurate estimation of many useful indicators for network analysis with the guarantee of accuracy, and the ADSs for all the vertices in a graph can be computed in near-linear time. Because of these useful properties, ADS has attracted considerable attention. However, a critical drawback of ADS is its space requirement, which tends to be much larger than that of the graph itself. In the present study, we address this issue by designing a new graph sketching scheme, namely, sketch retrieval shortcuts (SRS). Although SRSs are more space-efficient than ADSs by an order of magnitude, an ADS of any vertex can be quickly retrieved from the SRSs. The retrieved ADSs can be used to estimate the aforementioned indicators in exactly the same manner as with plain ADSs, inheriting the same accuracy guarantee. Our experiments on real-world networks demonstrate the usefulness of SRSs as a practical back-end of large-scale graph data mining.

Filed under: Graph Mining and Social Networks | Mining Rich Data Types