The SIGKDD Test of Time Award recognizes outstanding papers from past KDD Conferences beyond the last decade that have had an important impact on the data mining research community.
Yehuda Koren, Google Israel
Factorization meets the neighborhood: a multifaceted collaborative filtering model
Significance of Paper
Collaborative Filtering is an established method for building recommendation models. This paper ties together some of the important variants of collaborative filtering, in a way improving the model accuracy, while at the same time enhancing explainability and ability to handle new "cold start" users.
The paper suggests a single framework combining neighborhood models with latent factor models (commonly known nowadays as "embedding models"), thereby capitalizing on the advantages of both approaches: Neighborhood models are most effective at detecting very localized relationships, whereas latent factor models are more effective at estimating overall structure that relates simultaneously to most or all items.
In passing, the paper also describes a convex neighborhood model amenable to global optimization, which outperforms the usual heuristic-based models by learning optimal item::item interpolation weights.
Another contribution is integrating different forms of user input into the model. Recommender systems rely on different types of input, including higher quality explicit feedback and the more abundant implicit feedback, which indirectly reflect opinion through observing user behavior. The paper shows ways to integrate implicit and explicit feedback which increase fidelity to user preferences. Accordingly, one of the models described in the paper, SVD++, became a widely used high-performing baseline for measuring matrix factorization accuracy on explicit feedback data.
Recommender systems provide users with personalized suggestions for products or services. These systems often rely on Collaborating Filtering (CF), where past transactions are analyzed in order to establish connections between users and products. The two more successful approaches to CF are latent factor models, which directly profile both users and products, and neighborhood models, which analyze similarities between products or users. In this work we introduce some innovations to both approaches. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model. Further accuracy improvements are achieved by extending the models to exploit both explicit and implicit feedback by the users. The methods are tested on the Netflix data. Results are better than those previously published on that dataset. In addition, we suggest a new evaluation metric, which highlights the differences among methods, based on their performance at a top-K recommendation task.