2017 SIGKDD Test of Time Award Winners2017 KDD Test of Time Award Award Winner
The SIGKDD Test of Time Award recognizes outstanding papers from past KDD Conferences beyond the last decade that have had an important impact on the data mining research community.
Thorsten Joachims, Cornell University
Training Linear SVMs in Linear Time
Significance: Support Vector Machines (SVMs) are among the most popular machine learning techniques for high-dimensional and sparse data, but their efficient implementations have been difficult to obtain. Many popular implementations of SVM suffer from super-linear scaling behavior which makes their use inefficient or even intractable on large datasets. This paper proposes the first general training algorithm for linear SVMs that provably scales O(sn) for classification and O(snlog (n)) for ordinal regression, where s is the average number of non-zero features and n is the number of samples. This scaling is very attractive for high-dimensional and sparse data. The algorithm exploits a simple Cutting-Plane Algorithm for training linear SVMs that is shown to converge. Compared to previous methods, the algorithm has several advantages. First, it is very simple and easy to implement. Second, it is several orders of magnitude faster than existing decomposition methods on large classification problems. Third, the algorithm has a meaningful stopping criterion that directly relates to training error. This avoids wasting time on solving the optimization problem to a higher precision than necessary. Finally, the algorithm can handle ordinal regression problems with large data sets. The study opens several areas for research. Since it takes only a small number of sequential iterations through the data, it is promising for parallel implementations using out-of-core memory. Also, the algorithm can in principle be applied to SVMs with Kernels.
Original Abstract: Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N, while each example has only s << N non-zero features. This paper presents a Cutting-Plane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm is based on an alternative, but equivalent formulation of the SVM optimization problem. Empirically, the Cutting-Plane Algorithm is several orders of magnitude faster than decomposition methods like SVM-Light for large datasets.