KDD Papers

A Parallel and Primal-Dual Sparse Method for Extreme Classification

Ian Yen (Carnegie Mellon University);Xiangru Huang (University of Texas at Austin);Wei Dai (Carnegie Mellon University);Pradeep Ravikumar (Carnegie Mellon University);Inderjit Dhillon (University of Texas at Austin);Eric Xing (Carnegie Mellon University)


Extreme Classification considers the problem of multiclass or multilabel prediction when there is a huge number of classes: a scenario that occurs in many real-world applications such as text and image tagging. In this setting, standard classification methods with complexity linear to the number of classes become intractable, while enforcing structural constraints among classes (such as low-rank or tree-structured) to reduce complexity often sacrifices accuracy for efficiency. The recent \emph{PD-Sparse} method addresses this issue to gives an algorithm that is sublinear in the number of variables by exploiting \emph{primal-dual} sparsity inherent in the max-margin loss. However, the objective requires training models of all classes together, which incurs large memory consumption and prohibits it from the simple parallelization scheme that a one-versus-all method can easily take advantage of. In this work, we propose a primal-dual sparse method that enjoys the same parallelizability and space efficiency of one-versus-all approach, while having complexity sublinear to the number of classes. On several large-scale benchmark data sets, the proposed method achieves accuracy competitive to state-of-the-art methods while reducing training time from days to tens of minutes compared to existing parallel or sparse methods on a cluster of $100$ cores.