ACM Special Interest Group on Knowledge Discovery & Data Mining

KDD-2000

Sixth ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining
August 20-23, 2000
Boston, MA, USA

High Performance Data Mining

Vipin Kumar, Mohammed Zaki

Abstract:

A fundamental problem in data mining is to develop algorithms and systems which scale with increase in the amount of data, and with increase in the data dimensions and complexity.� Due to the huge size of data and amount of computation involved in mining algorithms, parallel and distributed processing is often considered an essential component for a successful data mining solution.

The goal of this tutorial is to provide researchers, practitioners, and advanced students with an introduction to high performance data mining. The focus will be on algorithms, software tools, and system architectures appropriate for mining massive data sets using techniques from scalable, parallel and distributed computing.

The tutorial will provide 1) an overview of fundamental parallel and distributed data mining algorithms covering common techniques like classification, associations, sequences, clustering, etc.; 2) an introduction to some of the basic architectural frameworks for high performance data mining systems; and 3) an understanding of some of the outstanding algorithmic and systems issues while mining large data sets.� With this knowledge, the audience should be better prepared to mine larger data sets in practice or undertake research in this area.

Biographies of Organizers:

Vipin Kumar is a Professor of Computer Science at the University of Minnesota. His current research focuses on parallel computing and data mining. His past research has produced highly efficient algorithms and softwares such as Metis, hMetis, and PSPASES. He has authored over 100 research articles, and coedited or coauthored 5 books including the widely used text book ``Introduction to Parallel Computing".� Kumar serves on the editorial boards of several prominent journals in parallel computing.� He is a Fellow of IEEE and the Minnesota Supercomputer Institute, and is a member of SIAM and ACM.

Mohammed J. Zaki is an Assistant Professor of Computer Science at Rensselaer Polytechnic Institute. His research interests include the design of efficient, scalable, and parallel algorithms and systems for various data mining tasks.� He has published over 40 papers in this area, and he recently co-edited the book, ``Large-scale Parallel Data Mining,'' LNAI State-of-the-Art-Survey, Vol. 1759, Springer-Verlag, 2000. He was co-chair for ACM SIGKDD workshop on Large-scale Parallel KDD Systems (1999), and is a co-chair for IEEE IPDPS Workshop on High Performance Data Mining (2000). He is a member of ACM and IEEE.

KDD-2000 Home

�