Recent tremendous technical advances in processing power, storage capacity, and inter-connectivity of computer technology is creating unprecedented quantities of digital data. Data mining, the science of extracting useful knowledge from such huge data repositories, has emerged as a young and interdisciplinary field in computer science. Data mining techniques have been widely applied to problems in industry, science, engineering and government, and it is widely believed that data mining will have profound impact on our society. The growing consensus that data mining can bring real value has led to an explosion in demand for novel data mining technologies and for students who are trained in data mining—students who have an understanding of data mining techniques, can apply them to real-life problems, and are trained for research and development of new data mining methods. Courses in data mining have started to sprawl all over the world.
Based on this development of the field, the ACM SIGKDD Executive Committee has set up the ACM SIGKDD Curriculum Committee to design a sample curriculum for data mining that gives recommendations for educating the next generation of students in data mining. Based on feedback from researchers, educators, and students, we are convinced that it is an important task to have a carefully designed, conceptually strong, technically rich, and balanced curriculum for this discipline. A comprehensive and balanced curriculum will ensure that the education in data mining sets a solid foundation for the healthy growth of the field, and it will promote systematic training of students in computer science, information sciences, and other related fields, and it will provide guidance for the training of the next generation of data mining researchers, developers and technology users.
The Curriculum Committee is composed of university professors and researchers who have actively contributed to data mining research and education, researchers and practitioners from industry who have rich experiences in applying data mining technology, and administrators from government agencies. This report is the first draft from the Intensive Working Group of the Committee. We expect that this draft will be extensively revised and reviewed, and we are looking forward to suggestions and recommendations from the Committee and from the general data mining research, development, and application community.
The remainder of this report is structured as follows. First, we outline the principles that guided us in the selection of material in Section 2. We then give a brief description of the prerequisites that we assume students of our proposed curriculum to have in Section 3. Section 4 contains the core of this document, our curriculum proposal.
Intensive Working Group of ACM SIGKDD Curriculum Committee:
Soumen Chakrabarti, Martin Ester, Usama Fayyad, Johannes Gehrke, Jiawei Han, Shinichi Morishita, Gregory Piatetsky-Shapiro, Wei Wang