KDD 2009: Program

General Schedule

A summary is below. Please Click Here for more details.

A printable version of the program can be downloaded here. (PDF)

Sunday 28th June
09:00 - 12:30	Workshops and Tutorials Workshop: Data Mining Case Studies and Practice Prize (DMCS #3) Gabor Melli, Peter van der Putten, Brendan Kitts Workshop: Data Mining using Matrices and Tensors (DMMT'09) Chris Ding, Tao Li Workshop: Human Computation Workshop (HCOMP 2009) Paul Bennett, Raman Chandrasekar, Max Chickering, Panos Ipeirotis, Edith Law, Foster Provost, Anton Mityagin, Luis von Ahn The 3rd International Workshop on Knowledge Discovery from Sensor Data (SensorKDD-2009) Olufemi Omitaomu, Auroop Ganguly, Joao Gama, Ranga Raju Vatsavai, Mohamed Medhat Gaber, Nitesh V. Chawla The 3rd Workshop on Social Network Mining and Analysis (SNA-KDD) Lee Giles, John Yen, Prasenjit Mitra, Haizheng Zhang, Igor Perisic The Third International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD) Ying Li, Arun C. Surendran, Dou Shen Workshop on CyberSecurity and Intelligence Informatics (CSI-KDD) Hsinchun Chen, Marc Dacier, Marie-Francine Moens, Gerhard Paaß, Christopher C. Yang Workshop on Visual Analytics and Knowledge Discovery (VAKD '09) Kai Puolamäki, Heikki Mannila, Alessio Bertone, Silvia Miksch, Mark A. Whiting, Jean Scholtz Workshop: Statistical and Relational Learning and Mining in Bioinformatics (StReBio'09) Jan Ramon, Fabrizio Costa, Christophe Costa Florencio, Joost Kok Tutorial: How to do good research, get it published in SIGKDD and get it cited! Eamonn Keogh Tutorial: Large Graph-Mining: Power Tools and a Practitioner's Guide Christos Faloutsos, Gary Miller, Charalampos Tsourakakis Tutorial: Planning, Running, and Analyzing Controlled Experiments on the Web Ronny Kohavi, Roger Longbotham, John Quarto-vonTivadar Tutorial: Statistical Challenges in Computational Advertising Deepayan Chakrabarti, Deepak Agarwal
10:00 - 10:30	Coffee Break
14:00 - 17:30	Workshops and Tutorials Workshop: KDD-Cup 2009: Fast Scoring on a Large Database (KDDcup09) Isabelle Guyon, David Vogel Workshop: The First ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data (U'09) Jian Pei, Lise Getoor, Ander de Keijzer Tutorial: Advances in Mining the Web Myra Spiliopoulou, Osmar Zaiane, Bamshad Mobasher, Olfa Nasraoui -- room tbd Tutorial: Event Detection Daniel Neill, Weng-Keen Wong Tutorial: New Directions in Data Quality Mining Laure Berti-Equille, Tamraparni Dasu Tutorial: Predictive Modelling in the Wild: Success Factors in Data Mining Competitions and Real-World Applications Saharon Rosset, Claudia Perlich Tutorial: Real World Text Mining Ronen Feldman, Lyle Ungar
15:30 - 16:00	Coffee Break
18:00 - 18:15	Opening Remarks
18:15 - 18:45	Award Presentations
19:00 - 19:30	Innovation Award Talk
Monday 29th June
09:00 - 10:00	Plenary Invited Talk - Mismatched Models, Wrong Results, and Dreadful Decisions
10:00 - 10:30	Coffee Break
10:30 - 12:30	4 Parallel Paper Sessions S01 Clustering, S02 Recommender Systems, S03 Temporal & Streams Mining, S04 Industry: Web Mining
12:00 - 14:00	Lunch (Sponsored by Microsoft adCenter Labs)
14:00 - 15:30	4 Parallel Paper Sessions S05 Social Content, S06 Supervised Learning, S07 Anomaly & Streams, S08 Industry: Mining Web Logs
15:30 - 16:00	Coffee Break
16:00 - 17:30	4 Parallel Paper Sessions S09 Text Mining, S10 Graph Mining, S11 Search & Advertising, S12 Combined: Anomaly Detection
17:30 - 19:00	Industry Sessions The Europe Media Monitor Family of News Analysis Applications Organizational Traits leading to High ROI for Data Mining Improving Online Marketing Performance through Data Mining and Optimization
19:30 - 22:00	Posters at Hôtel de Ville
Tuesday 30th June
09:00 - 10:00	Plenary Invited Talk - Network Science: An Introduction to Recent Statistical Approaches
10:00 - 10:30	Coffee Break
10:30 - 12:00	4 Parallel Paper Sessions S13 Statistical & Consensus Methods, S14 Privacy & Semi-Supervised Methods, S15 Pattern Mining Applications, S16 Industry: Enterprise & Finance
12:00 - 14:00	Lunch and SIGKDD Annual Meeting
14:00 - 15:30	4 Parallel Paper Sessions S17 Social Networks, S18 Web & Behavior Mining, S19 Active Learning, S20 Industry: Data Mining Experiences
15:30 - 16:00	Coffee Break
16:00 - 17:30	4 Parallel Paper Sessions S21 Dynamic Social Networks, S22 Multi-Relational Mining, S23 Combined: Temporal Data, S24 Industry: Security & Privacy
17:30 - 19:00	Panel Industry Sessions People aren't always doing what they are saying. Perception versus Reality The Telecom Revolution: Where does KDD go from here? Social network analysis for telco operators.
17:30 - 18:15 KDD	Transfer Meeting (SIGKDD organizers only)
19:30 - 22:00	Posters & Demos Demo - A Flexible Topic-driven Framework for News Exploration Juanzi Li, Jun Li, Jie Tang Demo - Curating and Searching the Annotated Web amit singh, Sayali Kulkarni, Somnath Banerjee, Ganesh Ramakrishnan, Soumen Chakrabarti Demo - Expert2B�l�: From Expert Finding to B�l� Search Zi Yang, Jie Tang, Bo Wang, Jingyi Guo, Juanzi Li Demo - Exploratory Recommender Systems for Sales and Marketing Michail Vlachos, Abdel Labbi Demo - Model Monitor: Tracking Model Performance in the Real World Troy Raeder, Nitesh V. Chawla Demo - Open Mobile Miner: A Toolkit for Mobile Data Stream Mining Shonali Krishnaswamy, Mohamed Medhat Gaber, Marian Harbach, Christian Hugues, Abhijat Sinha, Brett Gillick, Pari Delir Haghighi, Arkady Zaslavsky Demo - OSD: An Online Web Spam Detection System Bin Zhou, Jian Pei Demo - SHIFTR: A Fast and Scalable System for Ad Hoc Sensemaking of Large Graphs Duen Horng Chau, Aniket Kittur, Hanghang Tong, Christos Faloutsos, Jason I. Hong Demo - Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns Pedro H. Calais Guerra, Douglas E. V. Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen Demo - Visalix: A Web Application for Visual Data Analysis and Clustering Loic Lecerf, Boris Chidlovskii
Wednesday 1st July
09:00 - 10:00	Plenary Invited Talk - Randomization Methods in Data Mining
10:00 - 10:30	Coffee Break
10:30 - 12:30	4 Parallel Paper Sessions S25 Frequent Patterns, S26 Web Mining, S27 Combined: Applications, S28 Industry: Information Extraction & Text Mining

Invited Talks

Research Track Invited Speakers

David J. Hand, Imperial College London
Mismatched Models, Wrong Results, and Dreadful Decisions: On choosing appropriate data mining tools

Abstract: Data mining techniques use ‘score functions’ to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a ‘classification threshold’ to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.

Bio: David Hand is Professor of Statistics at Imperial College, London. He studied mathematics at the University of Oxford and statistics and pattern recognition at the University of Southampton. His most recent books are Information Generation: How Data Rule Our World and Statistics: a Very Short Introduction. He launched the journal Statistics and Computing, and served a term of office as editor of Journal of the Royal Statistical Society, Series C. He is currently President of the Royal Statistical Society. He has received various awards and prizes for his research, including the Guy medal of the Royal Statistical Society, a Research Merit Award from the Royal Society, and the IEEE-ICDM Outstanding Contributions Award. He was elected a Fellow of the British Academy in 2003.

Heikki Mannila, Helsinki Institute for Information Technology
Randomization Methods in Data Mining

Abstract: Data mining research has developed many algorithms for various analysis tasks on large and complex datasets. However, assessing the significance of data mining results has received less attention. Analytical methods are rarely available, and hence one has to use computationally intensive methods. Randomization approaches based on null models provide, at least in principle, a general approach that can be used to obtain empirical p-values for various types of data mining approaches. I review some of the recent work in this area, outlining some of the open questions and problems.

Bio: Heikki Mannila is the director of Helsinki Institute for Information Technology HIIT, a joint research institute of University of Helsinki and Helsinki University of Technology TKK, and a professor of computer science at TKK. He has also worked at University of Helsinki, Technical University of Vienna, Max Planck Institute for Computer Science, Microsoft Research, and Nokia Research Center. He has published two books and over 190 refereed articles in computer science and related areas. His specific area of interest is in algorithms for data analysis, and applications in science and in industry. He received the ACM SIGKDD Innovation award in 2003.

Stanley Wasserman, Indiana University
Network Science: An Introduction to Recent Statistical Approaches

Abstract: Network science focuses on relationships between social entities. It is used widely in the social and behavioral sciences, as well as in political science, economics, organizational science, and industrial engineering. The social network perspective has been developed over the last sixty years by researchers in psychology, sociology, and anthropology, and morerecently, to a lesser extent, in physics. Network science is gaining recognition and standing in the general social and behavioral science communities as the theoretical basis for examining social structures. This basis has been clearly defined by many theorists, and the paradigm convincingly applied to important substantive problems. However, the paradigm requires a new and different set of concepts and analytic tools, beyond those provided by standard quantitative (particularly, statistical) methods. These concepts and tools are the topics of this talk.

Bio: Stanley Wasserman is a Rudy Professor of Statistics, Psychology, and Sociology at Indiana University in Bloomington. Wasserman is best known for his work on statistical models for social networks and for his text, co-authored with Katherine Faust, Social Network Analysis: Methods and Applications. His other books have been published by Sage Publications and Cambridge University Press. He has published widely in sociology, psychology, and statistics journals. He is a fellow of the Royal Statistical Society, and an honorary fellow of the American Statistical Association and the American Association for the Advancement of Science. He has been an Associate Editor of a variety of statistics and methodological journals (Psychometrika, Journal of the American Statistical Association, Sociological Methodology, to name a few), as well as the Book Review Editor of Chance. His research, which focuses primarily on networks, has been supported over the years by NSF, ONR, and NIMH.

Industrial and Government Applications Track Invited Speakers

Ravi Kumar, Yahoo! Research
Mining Web Logs: Applications and Challenges

Abstract: Web logs record the primary interaction of users with web pages in general and search engines in particular. There are two sources for such logs: user trails obtained from toolbars and query/click information obtained from search engines. In this talk we will address the task of mining this rich data to improve user experience on the web. We will illustrate a few applications, together with the modeling and algorithmic challenges that stem from these applications. We will also discuss the privacy issues that arise in this context.

Bio: Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research sta? member at the IBM Almaden
Research Center in the Computer Science Principles and Methodologies group. His primary interests are web algorithms, algorithms for large data sets, and theory of computation. He obtained his PhD in Computer Science from Cornell University in December 1997.

Ashok N. Srivastava, NASA Ames Research Center
Data Mining at NASA: from Theory to Applications

Abstract: NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.

Bio: Ashok N. Srivastava is the Principal Investigator of the Integrated Vehicle Health Management Project at NASA which is an agency-wide role in the NASA Aviation Safety Program. He also leads the Intelligent Data Understanding group at NASA Ames Research Center. The group performs research and development of advanced machine learning and data mining algorithms in support of NASA missions. He also develops new algorithms for studying climate change and the the large-scale structure of the universe. He has won numerous awards, including the NASA Exceptional Achievement Medal, one of NASA’s highest awards, the NASA Distinguished Performance Award, several NASA Group Achievement Awards, and the IBM Golden Circle Award.