GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction
XianXing Zhang*, LinkedIn; Bee-Chung Chen, LinkedIn; Liang Zhang, LinkedIn; Yitong Zhou, LinkedIn Corporation; Yiming Ma, LinkedIn; Deepak Agarwal, LinkedIn
Generalized linear model (GLM) is a widely used class of models for statistical inference and response prediction problems. For instance, in order to recommend relevant content to a user or optimize for revenue, many web companies use logistic regression models to predict the probability of the user’s clicking on an item (e.g., ad, news article, job). In scenarios where the data is abundant, having a more ﬁne-grained model at the user or item level would potentially lead to more accurate prediction, as the user’s personal preferences on items and the item’s speciﬁc attraction for users can be better captured. One common approach is to introduce ID-level regression coeﬃcients in addition to the global regression coeﬃcients in a GLM setting, and such models are called generalized linear mixed models (GLMix) in the statistical literature. However, for big data sets with a large number of ID-level coeﬃcients, ﬁtting a GLMix model can be computationally challenging. In this paper, we re-port how we successfully overcame the scalability bottleneck by applying parallelized block coordinate descent under the Bulk Synchronous Parallel (BSP) paradigm. We deployed the model in the LinkedIn job recommender system, and generated 20% to 40% more job applications for job seekers on LinkedIn.
Filed under: Big Data