Meta Structure: Computing Relevance in Large Heterogeneous Information Networks
Zhipeng Huang*, University of Hong Kong; Yudian Zheng, The University of Hong Kong; Reynold Cheng, ; Yizhou Sun, Northeastern Univ; Nikos Mamoulis, ; Xiang Li, The University of Hong Kong
A heterogeneous information network (HIN) is a graph model in which objects and edges are annotated with types. Large and complex databases, such as YAGO and DBLP, can be modeled as HINs. A fundamental problem in HINs is the computation of closeness, or relevance, between two HIN objects. Relevance measures can be used in various applications, including entity resolution, recommendation, and information retrieval. Several studies have investigated the use of HIN information for relevance computation, however, most of them only utilize simple structure, such as path, to measure the similarity between objects. In this paper, we propose to use meta structure, which is a directed acyclic graph of object types with edge types connecting in between, to measure the proximity between objects. The strength of meta structure is that it can describe complex relationship between two HIN objects (e.g., two papers in DBLP share the same authors and topics). We develop three relevance measures based on meta structure. Due to the computational complexity of these measures, we further design an algorithm with data structures proposed to support their evaluation. Our extensive experiments on YAGO and DBLP show that meta structure-based relevance is more effective than state-of-the-art approaches, and can be efficiently computed.
Filed under: Graph Mining and Social Networks