KDD 2005 - Introduction to Logistic Regression: Aug 21-24, Chicago, IL. USA

Tutorial: Introduction to Logistic Regression

Presenters:

Dave Lewis (David D. Lewis Consulting)
kdd05tutorial at daviddlewis.com
http://www.daviddlewis.com

Abstract:

This tutorial will present a broad ranging introduction to logistic regression, a flexible, effective approach to supervised learning of classifiers. The emphasis is on diversity of perspectives. Logistic regression has been discovered and rediscovered under a range of names and notations, in a variety of fields. The tutorial will attempt to present the best of the insights from many of these fields. The tutorial also indirectly serves as an overview of some important concepts in modern machine learning, including loss functions, regularization, model misspecification, optimization algorithms, and the extent to which effectiveness of trained models can be predicted.

Issues relevant to real world data mining, such as high dimensionality, sparse data, noise, and their implications will be discussed. Where possible the presentation will emphasize qualitative and graphical presentations of information rather than equations. Computational examples using publicly available software and datasets will be presented.

Intended Audience:

The target audience is data mining researchers, as well as practitioners with a good understanding of basic concepts in statistics and supervised machine learning. Researchers will get a compact, unified introduction to a technique with a wide range of uses in machine learning and data mining. They will increase their ability to understand and draw insights from the large and diverse literature on logistic regression, with its often contradictory and confusing jargon and notation. Open problems, research directions, and interesting connections between fields will be discussed. Practitioners will learn about an effective technique for many practical problems in data mining and prediction. They will gain a broad perspective that should aid them in understanding confusing terminology and contradictory claims, both in the research literature and by data mining software vendors.

Biography:

David D. Lewis is an independent consultant working in the areas of information retrieval, machine learning, and natural language processing. Dr. Lewis has worked with startups, large corporations, nonprofits, and governmental organizations on access to and mining of text data. He has collaborated extensively with university researchers in computer science, statistics, and other fields, and co-founded a data mining software company. Lewis has published more than forty scientific papers and holds six patents on information retrieval and text mining technology. He was a member of committees that designed and administered many of the U.S. government MUC and TREC evaluations of language processing technologies.

From 1992 to 2000, Dave Lewis was a researcher at AT&T Labs and Bell Labs. From 1991 to 1992, he was a research faculty member at the Center for Information and Language Studies at the University of Chicago. He received his Ph.D. in Computer Science from the University of Massachusetts at Amherst in 1992.

Webmaster: Michal Sabala