|
|
General Schedule
A summary is below. Please Click Here for more details.
A printable version of the program can be downloaded here. (PDF)
| Sunday 28th June |
| 09:00 - 12:30 |
Workshops and Tutorials
Workshop: Data Mining Case Studies and Practice Prize (DMCS #3)
Gabor Melli, Peter van der Putten, Brendan Kitts
Workshop: Data Mining using Matrices and Tensors (DMMT'09)
Chris Ding, Tao Li
Workshop: Human Computation Workshop (HCOMP 2009)
Paul Bennett, Raman Chandrasekar, Max Chickering, Panos Ipeirotis, Edith Law, Foster Provost, Anton Mityagin, Luis von Ahn
The 3rd International Workshop on Knowledge Discovery from Sensor Data (SensorKDD-2009)
Olufemi Omitaomu, Auroop Ganguly, Joao Gama, Ranga Raju Vatsavai, Mohamed Medhat Gaber, Nitesh V. Chawla
The 3rd Workshop on Social Network Mining and Analysis (SNA-KDD)
Lee Giles, John Yen, Prasenjit Mitra, Haizheng Zhang, Igor Perisic
The Third International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD)
Ying Li, Arun C. Surendran, Dou Shen
Workshop on CyberSecurity and Intelligence Informatics (CSI-KDD)
Hsinchun Chen, Marc Dacier, Marie-Francine Moens, Gerhard Paaß, Christopher C. Yang
Workshop on Visual Analytics and Knowledge Discovery (VAKD '09)
Kai Puolamäki, Heikki Mannila, Alessio Bertone, Silvia Miksch, Mark A. Whiting, Jean Scholtz
Workshop: Statistical and Relational Learning and Mining in Bioinformatics (StReBio'09)
Jan Ramon, Fabrizio Costa, Christophe Costa Florencio, Joost Kok
Tutorial: How to do good research, get it published in SIGKDD and get it cited!
Eamonn Keogh
Tutorial: Large Graph-Mining: Power Tools and a Practitioner's Guide
Christos Faloutsos, Gary Miller, Charalampos Tsourakakis
Tutorial: Planning, Running, and Analyzing Controlled Experiments on the Web
Ronny Kohavi, Roger Longbotham, John Quarto-vonTivadar
Tutorial: Statistical Challenges in Computational Advertising
Deepayan Chakrabarti, Deepak Agarwal
|
| 10:00 - 10:30 |
Coffee Break |
| 14:00 - 17:30 |
Workshops and Tutorials
Workshop: KDD-Cup 2009: Fast Scoring on a Large Database (KDDcup09)
Isabelle Guyon, David Vogel
Workshop: The First ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data (U'09)
Jian Pei, Lise Getoor, Ander de Keijzer
Tutorial: Advances in Mining the Web
Myra Spiliopoulou, Osmar Zaiane, Bamshad Mobasher, Olfa Nasraoui -- room tbd
Tutorial: Event Detection
Daniel Neill, Weng-Keen Wong
Tutorial: New Directions in Data Quality Mining
Laure Berti-Equille, Tamraparni Dasu
Tutorial: Predictive Modelling in the Wild: Success Factors in Data Mining Competitions and Real-World Applications
Saharon Rosset, Claudia Perlich
Tutorial: Real World Text Mining
Ronen Feldman, Lyle Ungar
|
| 15:30 - 16:00 |
Coffee Break |
| 18:00 - 18:15 |
Opening Remarks
|
| 18:15 - 18:45 |
Award Presentations
|
| 19:00 - 19:30 |
Innovation Award Talk |
| Monday 29th June |
| 09:00 - 10:00 |
Plenary Invited Talk - Mismatched Models, Wrong Results, and Dreadful Decisions |
| 10:00 - 10:30 |
Coffee Break |
| 10:30 - 12:30 |
4 Parallel Paper Sessions
S01 Clustering, S02 Recommender Systems, S03 Temporal & Streams Mining, S04 Industry: Web Mining
|
| 12:00 - 14:00 |
Lunch (Sponsored by Microsoft adCenter Labs) |
| 14:00 - 15:30 |
4 Parallel Paper Sessions
S05 Social Content, S06 Supervised Learning, S07 Anomaly & Streams, S08 Industry: Mining Web Logs
|
| 15:30 - 16:00 |
Coffee Break |
| 16:00 - 17:30 |
4 Parallel Paper Sessions
S09 Text Mining, S10 Graph Mining, S11 Search & Advertising, S12 Combined: Anomaly Detection
|
| 17:30 - 19:00 |
Industry Sessions
The Europe Media Monitor Family of News Analysis Applications
Organizational Traits leading to High ROI for Data Mining
Improving Online Marketing Performance through Data Mining and Optimization
|
| 19:30 - 22:00 |
Posters at Hôtel de Ville |
| Tuesday 30th June |
| 09:00 - 10:00 |
Plenary Invited Talk - Network Science: An Introduction to Recent Statistical Approaches |
| 10:00 - 10:30 |
Coffee Break |
| 10:30 - 12:00 |
4 Parallel Paper Sessions
S13 Statistical & Consensus Methods, S14 Privacy & Semi-Supervised Methods, S15 Pattern Mining Applications, S16 Industry: Enterprise & Finance
|
| 12:00 - 14:00 |
Lunch and SIGKDD Annual Meeting |
| 14:00 - 15:30 |
4 Parallel Paper Sessions
S17 Social Networks, S18 Web & Behavior Mining, S19 Active Learning, S20 Industry: Data Mining Experiences
|
| 15:30 - 16:00 |
Coffee Break |
| 16:00 - 17:30 |
4 Parallel Paper Sessions
S21 Dynamic Social Networks, S22 Multi-Relational Mining, S23 Combined: Temporal Data, S24 Industry: Security & Privacy
|
| 17:30 - 19:00 |
Panel
Industry Sessions
People aren't always doing what they are saying. Perception versus Reality
The Telecom Revolution: Where does KDD go from here?
Social network analysis for telco operators.
|
| 17:30 - 18:15 KDD |
Transfer Meeting (SIGKDD organizers only) |
| 19:30 - 22:00 |
Posters & Demos
Demo - A Flexible Topic-driven Framework for News Exploration
Juanzi Li, Jun Li, Jie Tang
Demo - Curating and Searching the Annotated Web
amit singh, Sayali Kulkarni, Somnath Banerjee, Ganesh Ramakrishnan, Soumen Chakrabarti
Demo - Expert2B�l�: From Expert Finding to B�l� Search
Zi Yang, Jie Tang, Bo Wang, Jingyi Guo, Juanzi Li
Demo - Exploratory Recommender Systems for Sales and Marketing
Michail Vlachos, Abdel Labbi
Demo - Model Monitor: Tracking Model Performance in the Real World
Troy Raeder, Nitesh V. Chawla
Demo - Open Mobile Miner: A Toolkit for Mobile Data Stream Mining
Shonali Krishnaswamy, Mohamed Medhat Gaber, Marian Harbach, Christian Hugues, Abhijat Sinha, Brett Gillick, Pari Delir Haghighi, Arkady Zaslavsky
Demo - OSD: An Online Web Spam Detection System
Bin Zhou, Jian Pei
Demo - SHIFTR: A Fast and Scalable System for Ad Hoc Sensemaking of Large Graphs
Duen Horng Chau, Aniket Kittur, Hanghang Tong, Christos Faloutsos, Jason I. Hong
Demo - Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns
Pedro H. Calais Guerra, Douglas E. V. Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen
Demo - Visalix: A Web Application for Visual Data Analysis and Clustering
Loic Lecerf, Boris Chidlovskii
|
| Wednesday 1st July |
| 09:00 - 10:00 |
Plenary Invited Talk - Randomization Methods in Data Mining |
| 10:00 - 10:30 |
Coffee Break |
| 10:30 - 12:30 |
4 Parallel Paper Sessions
S25 Frequent Patterns, S26 Web Mining, S27 Combined: Applications, S28 Industry: Information Extraction & Text Mining
|
Invited Talks
Research Track Invited Speakers
David J. Hand, Imperial College London
Mismatched Models, Wrong Results, and Dreadful Decisions: On choosing appropriate data mining tools
Abstract: Data mining techniques use ‘score functions’ to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a ‘classification threshold’ to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.
Bio: David Hand is Professor of Statistics at Imperial College, London. He studied mathematics at the University of Oxford and statistics and pattern recognition at the University of Southampton. His most recent books are Information Generation: How Data Rule Our World and Statistics: a Very Short Introduction. He launched the journal Statistics and Computing, and served a term of office as editor of Journal of the Royal Statistical Society, Series C. He is currently President of the Royal Statistical Society. He has received various awards and prizes for his research, including the Guy medal of the Royal Statistical Society, a Research Merit Award from the Royal Society, and the IEEE-ICDM Outstanding Contributions Award. He was elected a Fellow of the British Academy in 2003.
Heikki Mannila, Helsinki Institute for Information Technology
Randomization Methods in Data Mining
Abstract: Data mining research has developed many algorithms for various analysis tasks on large and complex datasets. However, assessing the significance of data mining results has received less attention. Analytical methods are rarely available, and hence one has to use computationally intensive methods. Randomization approaches based on null models provide, at least in principle, a general approach that can be used to obtain empirical p-values for various types of data mining approaches. I review some of the recent work in this area, outlining some of the open questions and problems.
Bio: Heikki Mannila is the director of Helsinki Institute for Information Technology HIIT, a joint research institute of University of Helsinki and Helsinki University of Technology TKK, and a professor of computer science at TKK. He has also worked at University of Helsinki, Technical University of Vienna, Max Planck Institute for Computer Science, Microsoft Research, and Nokia Research Center. He has published two books and over 190 refereed articles in computer science and related areas. His specific area of interest is in algorithms for data analysis, and applications in science and in industry. He received the ACM SIGKDD Innovation award in 2003.
Stanley Wasserman, Indiana University
Network Science: An Introduction to Recent Statistical Approaches
Abstract: Network science focuses on relationships between social entities. It is used widely in the social and behavioral sciences, as well as in political science, economics, organizational science, and industrial engineering. The social network perspective has been developed over the last sixty years by researchers in psychology, sociology, and anthropology, and morerecently, to a lesser extent, in physics. Network science is gaining recognition and standing in the general social and behavioral science communities as the theoretical basis for examining social structures. This basis has been clearly defined by many theorists, and the paradigm convincingly applied to important substantive problems. However, the paradigm requires a new and different set of concepts and analytic tools, beyond those provided by standard quantitative (particularly, statistical) methods. These concepts and tools are the topics of this talk.
Bio: Stanley Wasserman is a Rudy Professor of Statistics, Psychology, and Sociology at Indiana University in Bloomington. Wasserman is best known for his work on statistical models for social networks and for his text, co-authored with Katherine Faust, Social Network Analysis: Methods and Applications. His other books have been published by Sage Publications and Cambridge University Press. He has published widely in sociology, psychology, and statistics journals. He is a fellow of the Royal Statistical Society, and an honorary fellow of the American Statistical Association and the American Association for the Advancement of Science. He has been an Associate Editor of a variety of statistics and methodological journals (Psychometrika, Journal of the American Statistical Association, Sociological Methodology, to name a few), as well as the Book Review Editor of Chance. His research, which focuses primarily on networks, has been supported over the years by NSF, ONR, and NIMH.
Industrial and Government Applications Track Invited Speakers
Ravi Kumar, Yahoo! Research
Mining Web Logs: Applications and Challenges
Abstract: Web logs record the primary interaction of users with web pages in general and search engines in particular. There are two sources for such logs: user trails obtained from toolbars and query/click information obtained from search engines. In this talk we will address the task of mining this rich data to improve user experience on the web. We will illustrate a few applications, together with the modeling and algorithmic challenges that stem from these applications. We will also discuss the privacy issues that arise in this context.
Bio: Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research sta? member at the IBM Almaden
Research Center in the Computer Science Principles and Methodologies group. His primary interests are web algorithms, algorithms for large data sets, and theory of computation. He obtained his PhD in Computer Science from Cornell University in December 1997.
Ashok N. Srivastava, NASA Ames Research Center
Data Mining at NASA: from Theory to Applications
Abstract: NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.
Bio: Ashok N. Srivastava is the Principal Investigator of the Integrated Vehicle Health Management Project at NASA which is an agency-wide role in the NASA Aviation Safety Program. He also leads the Intelligent Data Understanding group at NASA Ames Research Center. The group performs research and development of advanced machine learning and data mining algorithms in support of NASA missions. He also develops new algorithms for studying climate change and the the large-scale structure of the universe. He has won numerous awards, including the NASA Exceptional Achievement Medal, one of NASA’s highest awards, the NASA Distinguished Performance Award, several NASA Group Achievement Awards, and the IBM Golden Circle Award.
|