Crystal:Employer Name Normalization in the Online Recruitment Industry
Qiaoling Liu, CareerBuilder; Faizan Javed*, CareerBuilder; Matt McNair, CareerBuilder
Entity linking links entity mentions in text to the corresponding entities in a knowledge base (KB) and has many applications in both open domain and speciﬁc domains. For example, in the recruitment domain, linking employer names in job postings or resumes to entities in an employer KB is very important to many business applications. In this paper, we focus on this employer name normalization task, which has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location context, and handling name variations, irrelevant input data, and noises in the KB. We present a sys-tem called CompanyDepot which contains a machine learning based approach CompanyDepot-ML and a heuristic approach CompanyDepot-H to address these challenges in three steps: (1) searching for candidate entities based on a customized search engine for the KB; (2) ranking the candidate entities using learning-to-rank methods or heuristics; and (3) validating the top-ranked entity via binary classiﬁcation or heuristics. While CompanyDepot-ML shows better extendability and ﬂexibility, CompanyDepot-H serves as a strong baseline and useful way to collect training data for CompanyDepot-ML. The proposed system achieves 2.5%-21.4% higher coverage at the same precision level compared to an existing system used at CareerBuilder over multiple real-world datasets. Applying the system to a similar task of academic institution name normalization further shows the generalization ability of the method.
Filed under: Classification