Prediction-time Efficient Classification Using Feature Computational Dependencies
Liang Zhao (George Mason University); Amir Alipour-Fanid (George Mason University); Martin Slawski (George Mason University); Kai Zeng (George Mason University)
As machine learning methods are utilized in more and more real-world applications involving constraints on computational budgets, the systematic integration of such constraints into the process of model selection and model optimization is required to an increasing extent. A specific computational resource in this regard is the time needed for evaluating predictions on test instances. There is meanwhile a substantial body of work concerned with the joint optimization of accuracy and test-time efficiency by considering the time costs of feature generation and model prediction. During the feature generation process, significant redundant computations across different features occur in many applications. Although the elimination of such redundancies would reduce the time cost substantially, there has been little research in this area due to substantial technical challenges involved, especially: 1) the lack of an effective formulation for feature computation dependency; and 2) the nonconvex and discrete nature of the optimization over feature computation dependency. In order to address these problems, this paper first proposes a heterogeneous hypergraph to represent the feature computation dependency, after which a framework is proposed that jointly optimizes the accuracy and the exact test-time cost based on a given feature computational dependency. A continuous tight approximation to this original problem is proposed based on a non-monotone nonconvex regularization term. Finally, an effective nonconvex optimization algorithm is proposed to solve the problem, along with a theoretical analysis of the convergence conditions. Extensive experiments on eight synthetic datasets and six real-world datasets demonstrate the proposed models’ outstanding performance in terms of both accuracy and prediction-time cost.