User experience at social media and web platforms such as LinkedIn is heavily dependent on the performance and scalability of its products. Applications such as personalized search and recommendations require real-time scoring of millions of structured candidate documents associated with each query, with strict latency constraints. In such applications, the query incorporates the context of the user (in addition to search keywords if present), and hence can become very large, comprising of thousands of Boolean clauses over hundreds of document attributes. Consequently, candidate se-lection techniques need to be applied since it is infeasible to retrieve and score all matching documents from the underlying in-verted index. We propose CaSMoS, a machine learned candidate selection framework that makes use of Weighted AND (WAND) query. Our framework is designed to prune irrelevant documents and retrieve documents that are likely to be part of the top-k results for the query. We apply a constrained feature selection algorithm to learn positive weights for feature combinations that are used as part of the weighted candidate selection query. We have implemented and deployed this system to be executed in real time using LinkedIn’s Galene search platform. We perform extensive evaluation with different training data approaches and parameter settings, and investigate the scalability of the proposed candidate selection model. Our deployment of this system as part of LinkedIn’s job recommendation engine has resulted in significant reduction in latency (up to 25%) without sacrificing the quality of the retrieved results, thereby paving the way for more sophisticated scoring models.

Filed under: Recommender Systems