Curated by: Tao Li
The field of data mining increasingly adapts methods and algorithms from advanced matrix computations, graph theory and optimization. In these methods, the data is described using matrix representations (graphs are represented by their adjacency matrices) and the data mining problem is formulated as an optimization problem with matrix variables. With these, the data mining task becomes a process of minimizing or maximizing a desired objective function of matrix variables.
Prominent examples include spectral clustering, matrix factorization, tensor analysis, and regularizations. These matrix-formulated optimization-centric methodologies are rapidly evolving into a popular research area for solving challenging data mining problems. These methods are amenable to vigorous analysis and benefit from the well-established knowledge in linear algebra, graph theory, and optimization accumulated through centuries. They are also simple to implement and easy to understand, in comparison with probabilistic, information-theoretic, and other methods. In addition, they are well-suited to parallel and distributed processing for solving large scale problems. Last but not the least, these methodologies are quite flexible and they can be used to formulate a large number of data mining tasks.
Workshop on algorithms for modern massive datasets (MMDS)
Related KDD2016 Papers
|Title & Authors|
|Optimally Discriminative Choice Sets in Discrete Choice Models: Application to Data-Driven Test Desi|
Author(s): Igor Labutov*, Cornell University
|Joint Optimization of Multiple Performance Metrics in Online Video Advertising|
Author(s): Sahin Geyik*, Turn Inc.; Sergey Faleev, Turn Inc.; Jianqiang Shen, Turn Inc.; Sean O'Donnell, Turn Inc.; Santanu Kolay, Turn Inc.
|MAP: Frequency-Based Maximization of Airline Profits based on an Ensemble Forecasting Approach|
Author(s): Bo An, ; Haipeng Chen, Nanyang Technological Universi; Noseong Park*, University of Maryland; V.S. Subrahmanian, Univ of Maryland
|Matrix Computations and Optimization in Apache Spark|
Author(s): Reza Zadeh*, Stanford University; Xiangrui Meng, ; Alexander Ulanov, ; Burak Yavuz, ; Li Pu, ; Shivaram Venkataraman, ; Evan Sparks, ; Aaron Staple, ; Matei Zaharia,
|Online dual decomposition for performance and delivery-based distributed ad allocation|
Author(s): Jim Huang*, Amazon; Rodolphe Jenatton, Amazon; Cedric Archambeau, Amazon
|Lossless Separation of Web Pages into Layout Code and Data|
Author(s): Adi Omari*, Technion; Benny Kimelfeld, Technion; Sharon Shoham, Academic College of Tel Aviv Yaffo; Eran Yahav, Technion
|Email Volume Optimization at LinkedIn|
Author(s): Rupesh Gupta*, LinkedIn; Xiaoyu Chen, ; Guanfeng Liang, ; Romer Rosales, LinkedIn; Hsiao-Ping Tseng, ; Ravi Kiran Holur Vijay,
|Portfolio Selections in P2P Lending: A Multi-Objective Perspective|
Author(s): Hongke Zhao*, USTC; Guifeng Wang, USTC; Yong Ge, UNC Charlotte; Qi Liu, University of Science and Technology of China; Enhong Chen,
|Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices|
Author(s): Yasuo Tabei*, JST; Hiroto Saigo, Kyushu Institute of Technology; Yoshihiro Yamanishi, Kyushu University; Simon Puglisi, Helsinki University