KDD Papers

Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping

Bin Gu (University of Texas at Arlington);Guodong Liu (Univ. of Texas at Arlington);Heng Huang (University of Texas at Arlington)


Variable selection with identifying homogenous groups of features is crucial for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping by $\ell_{1}$ norm and pairwise $\ell_{\infty}$ norm. However, due to over-complex representation of the penalty (especially the pairwise $\ell_{\infty}$ norm), until now OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm of OSCAR (OscarGKPath). Given a set of homogenous groups of features and an accuracy $\varepsilon$, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. Theoretically, we prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy $\varepsilon$. The experimental results on a variety of datasets not only confirm the effectiveness of our OscarGKPath, but also show the superiority of our OscarGKPath for cross validation compared with the batch algorithm.