Accepted Papers

Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop

Yutao Zhang (Tsinghua University); Fanjin Zhang (Tsinghua University); Peiran Yao (Tsinghua University); Jie Tang (Tsinghua University)

AMiner 1 is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases [25].

In this paper, we present the implementation and deployment of name disambiguation , a core component in AMiner. The problem has been studied for decades but remains largely unsolved. In AMiner, we did a systemic investigation into the problem and propose a comprehensive framework to address the problem. We propose a novel representation learning method by incorporating both global and local information and present an end-to-end cluster size estimation method that is significantly better than traditional BIC-based method. To improve accuracy, we involve human annotators into the disambiguation process. We carefully evaluate the proposed framework on real-world large data and experimental results show that the proposed solution achieves clearly better performance (+7-35% in terms of F1-score) than several state-of-the-art methods including GHOST [5], Zhang et al. [33], and Louppe et al. [17].

Finally, the algorithm has been deployed in AMiner to deal with the disambiguation problem at the billion scale, which further demonstrates both effectiveness and efficiency of the proposed framework.

Promotional Video