Generalized linear model (GLM) is a widely used class of models for statistical inference and response prediction problems. For instance, in order to recommend relevant content to a user or optimize for revenue, many web companies use logistic regression models to predict the probability of the user’s clicking on an item (e.g., ad, news article, job). In scenarios where the data is abundant, having a more fine-grained model at the user or item level would potentially lead to more accurate prediction, as the user’s personal preferences on items and the item’s specific attraction for users can be better captured. One common approach is to introduce ID-level regression coefficients in addition to the global regression coefficients in a GLM setting, and such models are called generalized linear mixed models (GLMix) in the statistical literature. However, for big data sets with a large number of ID-level coefficients, fitting a GLMix model can be computationally challenging. In this paper, we re-port how we successfully overcame the scalability bottleneck by applying parallelized block coordinate descent under the Bulk Synchronous Parallel (BSP) paradigm. We deployed the model in the LinkedIn job recommender system, and generated 20% to 40% more job applications for job seekers on LinkedIn.

Filed under: Big Data