Association for Computing Machinery
ACM Special Interest Group on Knowledge Discovery & Data Mining

 

 

KDD-2000

Sixth ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining

August 20-23, 2000
Boston, MA, USA

Knowledge Discovery in Biological Domains

 

I. Jurisica, I. Rigoutsos and A. Floratos

 

Abstract:

Biological research is generating data at an explosive rate. The Human Genome Project is expected to identify the codes for over 3 billion bases by the year 2003. This will provide code for about 100,000 proteins. Analyzing this volume of data and using it intelligently is a challenge because of its complexity, its multiple interdependent factors, the uncertainty of these dependencies, and the continuous evolution of our understanding of the data.In general, reasoning with biomedical information requires flexible knowledge representation structures and powerful knowledge-discovery tools.

 

This tutorial provides an introduction to the latest computational techniques for data mining and knowledge discovery in biological domains. We will explore the fit of the traditional data-mining techniques for alphanumeric, visual and relational data to biology. After characterizing biological problems, basic definitions and diverse algorithms will be presented. This will include scientific discovery, pattern identification, organization, summarization and description, clustering, classifying, associating and predicting, and information extraction. An overview of current state-of-the-art commercial and academic systems will be covered, with the emphasis on successful examples of data mining and knowledge discovery in biology. The examples will include amino acid sequence analysis, homology detection, elucidation of biological function, protein structure prediction and identification of related proteins, systematic generation of bio-dictionaries(TM) and their exploitation, analysis of biological effects, model generation and use, DNA microarrays analysis, data curation, hypothesis generation and testing. We will identify limitations of generic approaches, define problems and issues that must be addressed to successfully mine biological sequence and structure databases. We will close by discussing future directions of knowledge discovery in biology, and its relevance of knowledge visualization, knowledge evolution and management of scientific knowledge.

 

Biographies of Organizers:

Igor Jurisica received a PhD degree in 1998 from the University of Toronto, and MSc. degrees in Electrical Engineering from Slovak Technical University and in Computer Science from the University of Toronto in 1991 and 1993 respectively. He was appointed to FIS as an Assistant Professor in January 1998 and he holds a position of a Visiting Scientist at the IBM Toronto Laboratory, Centre for Advanced Studies. His research interests are focused on knowledge management and computational biology. In the past he has worked on industrial projects in biomedical and engineering domains, he co-chaired several workshops on knowledge management and presented tutorials on knowledge management and knowledge discovery.

 

Isidore Rigoutsos received his BSc in Physics from the University of Athens, Greece, and his PhD in Computer Science from the Courant Institute of Mathematical Sciences of New York University. Since 1992, he has been with IBM's T.J. Watson Research Center, where he is currently the manager of the Bioinformatics and Pattern Discovery group. His research activities focus on computational biology, invariant descriptors for knowledge representation, applied mathematics and parallel computing. Dr. Rigoutsos is currently an Adjunct Professor at the Courant Institute of Mathematical Sciences and a Visiting Lecturer at the Department of Chemical Engineering of the Massachussets Institute of Technology, teaching Computational Biology.

 

Aris Floratos is a Research Stuff Member at the IBM T.J.Watson Research Center and an adjunct Professor of Computer Science at the Courant Institute of Mathematical Sciences of New York University. He received his BS from the University of Patras, Greece, Dept. of Computer Science and Engineering (1991) and his MS and PhD from the New York University, Dept. of Computer Science (1995 and 1999 respectively). Dr. Floratos research focuses on the application of computational techniques in the analysis of biological data. His scientific work has appeared in many journals and conferences and his is the author and co-author of 12 US patents.

KDD-2000 Home