|
|
KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery in Biological Domains
I. Jurisica, I. Rigoutsos and A. Floratos
Abstract: Biological research is generating data at an
explosive rate. The Human Genome Project is expected to identify the codes
for over 3 billion bases by the year 2003. This will provide code for about
100,000 proteins. Analyzing this volume of data and using it intelligently is
a challenge because of its complexity, its multiple interdependent factors,
the uncertainty of these dependencies, and the continuous evolution of our
understanding of the data.� In
general, reasoning with biomedical information requires flexible knowledge
representation structures and powerful knowledge-discovery tools. This tutorial
provides an introduction to the latest computational techniques for data
mining and knowledge discovery in biological domains. We will explore the fit
of the traditional data-mining techniques for alphanumeric, visual and
relational data to biology. After characterizing biological problems, basic
definitions and diverse algorithms will be presented. This will include
scientific discovery, pattern identification, organization, summarization and
description, clustering, classifying, associating and predicting, and
information extraction. An overview of current state-of-the-art commercial
and academic systems will be covered, with the emphasis on successful
examples of data mining and knowledge discovery in biology. The examples will
include amino acid sequence analysis, homology detection, elucidation of
biological function, protein structure prediction and identification of
related proteins, systematic generation of bio-dictionaries(TM) and their
exploitation, analysis of biological effects, model generation and use, DNA
microarrays analysis, data curation, hypothesis generation and testing. We
will identify limitations of generic approaches, define problems and issues
that must be addressed to successfully mine biological sequence and structure
databases. We will close by discussing future directions of knowledge
discovery in biology, and its relevance of knowledge visualization, knowledge
evolution and management of scientific knowledge. Biographies
of Organizers: Igor Jurisica received a PhD degree in 1998 from the
University of Toronto, and MSc. degrees in Electrical Engineering from Slovak
Technical University and in Computer Science from the University of Toronto
in 1991 and 1993 respectively. He was appointed to FIS as an Assistant
Professor in January 1998 and he holds a position of a Visiting Scientist at
the IBM Toronto Laboratory, Centre for Advanced Studies. His research
interests are focused on knowledge management and computational biology. In
the past he has worked on industrial projects in biomedical and engineering
domains, he co-chaired several workshops on knowledge management and
presented tutorials on knowledge management and knowledge discovery. Isidore Rigoutsos received his BSc in Physics from
the University of Athens, Greece, and his PhD in Computer Science from the
Courant Institute of Mathematical Sciences of New York University. Since
1992, he has been with IBM's T.J. Watson Research Center, where he is
currently the manager of the Bioinformatics and Pattern Discovery group. His
research activities focus on computational biology, invariant descriptors for
knowledge representation, applied mathematics and parallel computing. Dr.
Rigoutsos is currently an Adjunct Professor at the Courant Institute of
Mathematical Sciences and a Visiting Lecturer at the Department of Chemical
Engineering of the Massachussets Institute of Technology, teaching
Computational Biology. Aris Floratos is a Research Stuff Member at the IBM
T.J.Watson Research Center and an adjunct Professor of Computer Science at
the Courant Institute of Mathematical Sciences of New York University. He
received his BS from the University of Patras, Greece, Dept. of Computer
Science and Engineering (1991) and his MS and PhD from the New York
University, Dept. of Computer Science (1995 and 1999 respectively). Dr.
Floratos research focuses on the application of computational techniques in
the analysis of biological data. His scientific work has appeared in many journals
and conferences and his is the author and co-author of 12 US patents. |
|