2002 SIGKDD Innovation Award: Dr. Jerome H. Friedman

2002 SIGKDD Innovation Award Award Winner

Jerry Friedman has contributed a remarkable array of topics and methodologies to data mining and machine learning during the last 25 years.

In 1977, as leader of the numerical methods group at the Stanford Linear Accelerator Center (SLAC), he coauthored several algorithms for speeding up nearest-neighbor classifiers.

In the following seven years, he collaborated with Leo Breiman, Richard Olshen, and Charles Stone to produce a landmark work in decision tree methodology, "Classification and Regression Trees" (1984), and released the commercial product CART(R). This work introduced the gini, twoing, and ordered twoing splitting rules, cost-complexity pruning, oblique splitters, the use of a misclassification cost matrix to influence the growing of trees, and the application of cross validation to decision trees. Part of this work was pre-figured in his 1977 paper on decison tree induction.

During this time, he also introduced Projection Pursuit Regression (PPR) for predictive modeling and interactive data visualization.

Although PPR has had only a modest following, it was arguably the first instance of a feed-forward, single hidden layer, back propagation neural network with a remarkable twist: the activation function is itself estimated as part of the learning process and the number of hidden units to use is determined dynamically in a stagewise process (1974, 1981, 1987).

In 1991 Jerry extended recursive partitioning ideas to regression in his Multivariate Adaptive Regression Splines (MARS(tm)). In MARS, linear and logistic regressions are built up through searching for breakpoints in the predictor space. Variable selection, missing value handling, and variable transformation are all automated. MARS can be described as the first truly successful stepwise regression methodology. Richard DeVeaux, in a comparative study of MARS and Neural Networks (1993), found that MARS frequently outperformed neural networks in engineering applications and trained hundreds of times faster; similar findings have been reported by others more recently. In 1994, Jerry extended the MARS methodology to permit a dynamic spline version of discriminant analysis.

In the early 1990s Jerry focused on interactive data mining methods, introducing the Patient Rule Induction Method (PRIM, 1997), which he described as "Bump Hunting in High Dimensional Data." PRIM searches for data regions containing unusually high concentrations (or values) of a target variable and allows the analyst to interactively modify its rules and stretch or shrink the "boxes" defining the regions in question. PRIM has become one of the analytical methods of choice at Australia's CSIRO, a government-funded R&D and consulting lab with extensive data mining activity.

More recently, Jerry has focused on the study of boosting, both to understand why it is so successful and to develop improved boosting methodology. In a key article co-authored with Stanford statisticians Trevor Hastie and Rob Tibshirani, Jerry showed that boosting is a form of additive logistic regression and he identified the objective function that boosting seeks to maximize. He followed up with Stochastic Gradient Boosting, which generalizes boosting to a very large class of problems, eliminating the tendency of classical boosting to seriously mistrack when presented with mislabeled target data. In stochastic gradient boosting, small trees, very slow learning rates, mandatory sampling from the training data, and redefinition of the target variable are all combined to produce a remarkably fast and robust learner, capable of handling both regression and classification even under fairly adverse circumstances of dirty data. The methodology, called "MART" for Multiple Additive Regression Trees, includes visualization to convey the relationships between the target and predictors; it has been released commercially as TreeNet(tm).

Jerry has written a series of expository articles and a substantial book seeking to explain data mining to experienced data analysts and to relate machine learning to statistical foundations. Taken together, this list of new methodology, including CART, MARS, PRIM, PPR, and Gradient Boosting, constitutes one of the broadest ranges of contributions by any one person in the field.

Copyrights © 2016 All Rights Reserved - SIGKDD
ACM Code of Conduct