Invited Speaker Abstracts


List of Invited Speakers:

David Haussler
Prof. Tomasz Imielinski
Jerome H. Friedman


KDD95 HOME Page


(Return to List)

Invited Speaker 1: 11:00-11:50am in (408-BC) on Sunday, August 20

USING HIDDEN MARKOV MODELS
TO SEARCH BIOSEQUENCE DATABASES

David Haussler
UCSC

Abstract

As the Human Genome Project moves into high gear, biosequence databases are growing at a phenomenal rate. It is becoming increasingly important to develop more sensitive methods to search these databases to discover relationships between newly sequenced pieces of DNA and similar pieces that have already been sequenced, and whose function may already be known. We give a brief introduction to hidden Markov models, and then show how they can be used for this purpose.


(Return to List)

Invited Speaker 2: 9:40-10:30am in (408-BC) on Monday, August 21

A DATABASE PERSPECTIVE
ON KNOWLEDGE DISCOVERY

Prof. Tomasz Imielinski
Rutgers University

Abstract

We argue that database mining, as it is understood today, still has a very little database component and this is probably why it has only attracted relatively few database researchers. We criticize the simplistic point of view in which database mining is viewed merely as machine learning with very large data sets. We offer an alternative vision in which the goal of database mining is to provide support to "ad hoc" rule discovery tasks, analogous to the way contemporary database systems support ad hoc queries. We propose a set of primitives which can serve as building blocks for database mining applications. We also discuss a new set of performance criteria which we believe a good data mining system should satisfy.


(Return to List)

Invited Speaker 3: 1:30-2:30pm in (408-BC) on Monday, August 21

INTELLIGENT LOCAL LEARNING
FOR PREDICTION IN HIGH DIMENSIONS

Jerome H. Friedman
Department of Statistics
and
Stanford Linear Accelerator Center, Stanford University

Abstract

Local learning methods are among the earliest proposed for supervised learning. Local methods assign a weight to each training observation that regulates its influence on the training process. This weight depends upon the location of the training point in the input variable space relative to that of the point to be predicted. Training observations closer to the prediction point generally receive higher weights. Thus, a distance measure must be defined on the input variable space, and predictive accuracy can depend strongly on a particular choice. For example, simple Euclidean distance is well known to suffer form the curse-of-dimensionality which limits effectiveness in high dimensional settings (many inputs).

For any given problem there is an optimal definition of distance that depends both on the (unknown) underlying input-output relationship, and on the location (in the input variable space) of the point to be predicted. This talk will describe new types of ("intelligent") local learning methods that attempt to exploit this by using the training data to derive an appropriate distance measure separately for each individual prediction point. The connection between these new types of local learning methods and flexible function fitting techniques such as CART, C4.5, MARS, projection pursuit, radial basis functions, and feed-forward neural networks will be discussed.