How LinkedIn Economic Graph Bonds Information and Product: Applications in LinkedIn Salary
Xi Chen (LinkedIn Corporation); Yiqun Liu (LinkedIn Corporation); Liang Zhang (LinkedIn Corporation); Krishnaram Kenthapadi (LinkedIn Corporation)
The LinkedIn Salary product was launched in late 2016 with the goal of providing insights on compensation distribution to job seekers, so that they can make more informed decisions when discovering and assessing career opportunities. The compensation insights are provided based on data collected from LinkedIn members and aggregated in a privacy-preserving manner. Given the simultaneous desire for computing robust, reliable insights and for having insights to satisfy as many job seekers as possible, a key challenge is to reliably infer the insights at the company level when there is limited or no data at all. We propose a two-step framework that utilizes a novel, semantic representation of companies (Company2vec) and a Bayesian statistical model to address this problem. Our approach makes use of the rich information present in the LinkedIn Economic Graph, and in particular, uses the intuition that two companies are likely to be similar if employees are very likely to transition from one company to the other and vice versa. We compute embeddings for companies by analyzing the LinkedIn members’ company transition data using machine learning algorithms, then compute pairwise similarities between companies based on these embeddings, and finally incorporate company similarities in the form of peer company groups as part of the proposed Bayesian statistical model to predict insights at the company level. We perform extensive validation using several different evaluation techniques, and show that we can significantly increase the coverage of insights while, in fact, even slightly improving the quality of the obtained insights. For example, we were able to compute salary insights for 35 times as many title-region-company combinations in the U.S. as compared to previous work, corresponding to 4.9 times as many monthly active users. Finally, we highlight the lessons learned from practical deployment of our system.