Full Program
Program Highlights
Invited Talks
- Incentive Networks
Prabhakar Raghavan, Yahoo! Research
- Mining the Internet: The Eighth Wonder of the World
Gian Fulgoni, comScore
- The Architecture of Complexity: The structure and the dynamics of networks, from the web to the cell.
Albert-L�szl� Barab�si, Notre Dame
Research and Industrial/Government Tracks
- 40 research papers in 13 sessions
- 14 industrial/government papers in 3 sessions
- 36 research posters
- 11 industrial/government posters
- 2 panels
- 4 tutorials:
- Introduction to Logistic Regression
- Data Visualization and Mining using the GPU
- Randomized Algorithms for Matrices and Massive Data Sets
- Principles and Applications of Probabilistic Learning
- 9 workshops:
- Data Mining Methods for Anomaly Detection
- OSDM 2005: Open Source Data Mining
- UBDM 2005: Utility-Based Data Mining
- MRDM 2005: Multi-Relational Data Mining
- BIOKDD 2005: Data Mining in Bioinformatics
- DM-SSP 2005: Data Mining Standards, Services, and Platforms
- WebKDD 2005: Taming Evolving, Expanding and Multi-faceted Web Clickstreams
- LinkKDD 2005: Link Discovery: Issues, Approaches and Applications
- Multimedia Data Mining: "Mining Integrated Media and Complex Data"
Saturday, August 20
- 5:00pm-9:00pm (Regency Foyer) Registration
- Summarized Technical Program
Sunday
- 4 Tutorials
- 9 Workshops
- SIGKDD 2005 Opening
- Awards Ceremony
- KDD Cup 2005
Monday
- Invited Talk
- Research Track
- Temporal Mining (3 papers)
- Cost Sensitive Learning (2 papers)
- Privacy (3 papers)
- Streaming Data (2 papers)
- Industrial/Government Track
- 1 Best Research Paper
- 1 Best Applications Paper
- 1 Best Student Paper, 1 Runner-up
- Poster Highlights
- Poster Session and Reception
Tuesday
- Invited Talk
- Research Track
- Ensemble Learning (3 papers)
- Graph Mining (3 papers)
- Clustering (4 papers)
- Support Vector Machines (3 papers)
- Clustering and Grouping (4 papers)
- Text and Web Mining (4 papers)
- Industrial/Government Track
- Sequence Mining (3 papers)
- Anomaly Detection (4 papers)
- 1 Panel
Wednesday
- 1 Invited Talk
- 1 Panel
- Research Track
- Associations (3 papers)
- Novel Learning Algorithms (3 papers)
- Industrial/Government Track
- Document Analysis (3 papers)
The Program
Saturday, August 20:
| |
| 5:00pm-9:00pm |
Registration |
East Concourse Area |
Sunday, August 21:
| |
| 7:30am-8:00pm |
Registration (ongoing) |
Regency Foyer |
| |
| 8:30am-4:30pm |
Full-Day Workshops |
|
| Data Mining Methods for Anomaly Detection | Crystal B |
| UBDM 2005: Utility-Based Data Mining | Crystal C |
| LinkKDD 2005: Link Discovery: Issues, Approaches and Applications | Plaza B |
| |
| 9:00am-4:30pm |
Full-Day Workshops |
|
| OSDM 2005: Open Source Data Mining | Wrigley |
| MRDM 2005: Multi-Relational Data Mining | Toronto |
| BIOKDD 2005: Data Mining in Bioinformatics | Crystal A |
| DM-SSP 2005: Data Mining Standards, Services, and Platforms | Comiskey |
| WebKDD 2005: Taming Evolving, Expanding and Multi-faceted Web Clickstreams | Plaza A |
| Multimedia Data Mining: Mining Integrated Media and Complex Data | Acapulco |
| |
| 9:00am-12:00pm |
Tutorial |
Regency A |
| Introduction to Logistic Regression. Dave Lewis, David D. Lewis Consulting |
| |
| 9:00am-12:00pm |
Tutorial |
Regency B |
| Randomized Algorithms for Matrices and Massive Data Sets. Petros Drineas, Rensselaer Polytechnic Institute Michael W. Mahoney, Yale University |
| |
| 10:00am-10:30am |
Coffee Break |
Regency Foyer |
| |
| 12:00pm-1:30pm |
Lunch |
The Riverside Center West |
| |
| 1:30pm-4:30pm |
Tutorial |
Regency A |
| Data Visualization and Mining using the GPU. Sudipto Guha, University of Pennsylvania Shankar Krishnan, AT&T Labs Suresh Venkatasubramanian, AT&T Labs |
| |
| 1:30pm-4:30pm |
Tutorial |
Regency B |
| Principles and Applications of Probabilistic Learning. Padhraic Smyth, University of California at Irvine |
| |
| 3:00pm-3:30pm |
Coffee Break |
Regency Foyer |
| |
| 4:30pm-5:00pm |
Break |
|
| |
| 5:00pm-5:45pm |
KDD Opening and Awards |
Crystal Ballroom |
Robert Grossman, General Chair Roberto Bayardo, Kristin Bennett, Program Chairs Daniela Raicu, Student Awards Chair Gregory Piatetsky, SIGKDD Chair
|
| |
| 5:45pm-6:15pm |
KDD Service Award Presentation |
Crystal Ballroom |
| |
| 6:15pm-7:15pm |
KDD Cup Awards |
Crystal Ballroom |
| Ying Li, Zijian Zheng, KDD Cup Chairs |
Monday, August 22:
| |
| 7:30am-5:00pm |
Registration (ongoing) |
Regency Foyer |
| |
| 7:00am-8:30am |
Continental Breakfast-sponsored by SAS |
Regency Foyer |
| |
| 8:30am-10:00am |
Invited Talk |
Crystal Ballroom |
| Session Chair: Roberto Bayardo Incentive Networks Prabhakar Raghavan
Abstract: We propose a notion of incentive networks, modeling online settings in which multiple participants in a network help each other find information. Within this general setting, we study query incentive networks, a natural abstraction of question-answering systems with rewards for finding answers. We analyze strategic behavior in such networks and under a simple model of networks, show that the Nash equilibrium for participants' strategies exhibits an unexpected threshold phenomenon. (Joint work with Jon Kleinberg.) |
| |
| 9:00am-5:00pm |
Exhibits |
Regency C&D |
| |
| 10:00am-10:30am |
Coffee Break |
Regency Foyer |
| |
| 10:30am-12:00pm |
Industrial/Govt Track Session 1 [E-Commerce] |
Regency A |
| Chair: Ronny Kohavi |
| Price Prediction and Insurance for Online Auctions. Rayid Ghani |
| Predicting Product Purchase Patterns of Corporate Customers. Bhavani Raskutti, Alan Herschtal |
| Enhancing the Lift Curve Under Budget Constraints: An Application in the Mutual Fund Industry. Lian Yan, Michael Fassino, Patrick Baldasare, Robert Hull |
| |
| 10:30am-12:00pm |
Research Track Session 1 [Temporal Mining] |
Regency B |
| Chair: Jian Pei |
| Finding Partial Orders from Unordered 0-1 Data. Antti Ukkonen, Mikael Fortelius, Heikki Mannila |
| Detection of Emerging Space-Time Clusters. Daniel Neill, Andrew Moore, Maheshkumar Sabhnani, Kenny Daniel |
| Probabilistic Workflow Mining. Ricardo Silva, Jiji Zhang, James G. Shanahan |
| |
| 10:30am-12:00pm |
Research Track Session 2 [Privacy] |
Plaza A |
| Chair: Ramakrishnan Srikant |
| A New Scheme on Privacy-Preserving Data Classification. Nan Zhang, Shengquan Wang, Wei Zhao |
| Anonymity-Preserving Data Collection. Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright |
| A Distributed Learning Framework Based on Probabilistic Models. Srujana Merugu, Joydeep Ghosh |
| |
| 12:00pm-1:30pm |
Lunch-sponsored by Yahoo! Research Labs |
The Riverside Center West |
| |
| 1:30pm-2:30pm |
Research Track Session 3 [Best Student Papers] |
Regency A |
| Chair: Gautam Das |
| Query Chains Learning to Rank from Implicit Feedback. Filip Radlinski and Thorsten Joachims |
| Summarizing Itemset Patterns: A Profile-Based Approach. Xifeng Yan, Hong Cheng, Dong Xin, and Jiawei Han |
| |
| 1:30pm-2:30pm |
Research Track Session 4 [Cost Sensitive Learning] |
Regency B |
| Chair: Marko Grobelnik |
| Local Sparsity Control for Na�ve Bayes with Extreme Misclassification Costs. Aleksander Kolcz |
| Combining Email Models for False Positive Reduction. Shlomo Hershkop, Salvatore Stolfo |
| |
| 1:30pm-2:30pm |
Research Track Session 5 [Streaming Data] |
Plaza A |
| Chair: Petros Drineas |
| Streaming Feature Selection Using Alpha Investing. Jing Zhou, Dean Foster, Robert Stine, and Lyle Ungar |
| Wavelet Synopsis for Data Streams: Minimizing non-Euclidean Error. Sudipto Guha and Boulos Harb |
| |
| 2:30pm-3:30pm |
Paper Award Talks Best Paper Award |
Crystal Ballroom |
| Session Chair: Kristin Bennett |
| BEST RESEARCH PAPER AWARD |
| Graphs Over Time: Densification Laws, Shrinking Diameters, and Possible Explanations. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos |
| BEST APPLICATIONS PAPER AWARD |
| A Hit-Miss Model for Duplicate Detection in the WHO Drug Safety Database. G. Niklas Noren, Roland Orre and Andrew Bate |
| |
| 3:30pm-4:15pm |
Plenary Poster Presentations |
Crystal Ballroom |
| |
| 4:15pm-4:45pm |
Coffee Break |
Regency Foyer |
| |
| 4:45pm-5:45pm |
Plenary Poster Presentations |
Crystal Ballroom |
| |
| 6:15pm-7:00pm |
Buses begin leaving for Field Museum |
|
| |
| 7:00pm-10:00pm |
Poster Reception-sponsored by Fair Isaac |
Field Museum |
| Note: Attendees will be able to visit all Field Museum areas except special exhibits. |
Tuesday, August 23:
| |
| 7:30am-5:00pm |
Registration (ongoing) |
Regency Foyer |
| |
| 7:00am-8:30am |
Continental Breakfast-sponsored by SPSS |
Regency Foyer |
| |
| 8:30am-10:00am |
Invited Talk |
Crystal Ballroom |
| Session Chair: Robert Grossman Mining the Internet: The Eighth Wonder of the World Gian Fulgoni
Abstract: The Internet takes behavioral consumer research to a new level by providing the ability to passively and continuously monitor the complete online behavior of millions of consumers in an opt-in, privacy protected manner. Imagine the analytical possibilities if every site visited, every page viewed, content seen, transaction conducted ..... all of this granularity in behavior --- was continuously captured with explicit consumer permission for millions of consumers and privacy was protected. What unique insights could one gain into consumers' behavior, their interests, passions and lifestyles? What behavior could be predicted? What commercial applications would be possible? |
| |
| 9:00am-5:00pm |
Exhibits |
Regency C&D |
| |
| 10:00am-10:30am |
Coffee Break |
Regency Foyer |
| |
| 10:30am-12:00pm |
Industrial/Govt Track Session 2 [Sequence Mining] |
Regency B |
| Chair: Myra Spiliopoulou |
| Exploiting Retrieval Measures in the Early Stages of Mining Evolving Web Clickstreams. Olfa Nasraoui, Cesar Cardona, Carlos Rojas |
| Email Data Cleaning. Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang |
| Modeling and Predicting Personal Information Dissemination Behavior. Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, Ming-Ting Sun |
| |
| 10:30am-12:00pm |
Research Track Session 6 [Ensemble Learning] |
Regency A |
| Chair: Jennifer Dy |
| Robust Boosting and its relation to bagging. Saharon Rosset |
| Feature Bagging for Outlier Detection. Aleksandar Lazarevic, Vipin Kumar |
| Combining Partitions by Probabilistic Label Aggregation. Tilman Lange, Joachim Buhmann |
| |
| 10:30am-12:00pm |
Research Track Session 7 [Graph Mining] |
Plaza A/B |
| Chair: Tina Eliassi-Rad |
| Mining Tree queries in a graph. Bart Goethals, Eveline Hoekx, Jan Van den Bussche |
| On Mining Cross-Graph Quasi-Cliques. Jian Pei, Daxin Jiang, Aidong Zhang |
| Mining Closed Relational Graphs with Connectivity Constraints. Xifeng Yan, X. Jasmine Zhou, Jiawei Han |
| |
| 12:00pm-2:00pm |
SIGKDD Business Lunch-sponsored by Microsoft SQL Server |
The Riverside Center West |
| |
| 2:00pm-4:00pm |
Research Track Session 8 [Clustering] |
Regency A |
| Chair: Sugato Basu |
| Dimension Induced Clustering. Aris Gionis, Alexander Hinneburg, Spiros Papadimitriou, Panayiotis Tsaparas |
| On the Use of Linear Programming for Unsupervised Text Classification. Mark Sandler |
| A General Model for Clustering Binary Data. Tao Li |
| Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering. Bin GAO, Tie-Yan LIU, Xin Zheng, Qian-sheng Chen, Wei-Ying MA |
| |
| 2:00pm-3:30pm |
Research Track Session 9 [Support Vector Machines] |
Regency B |
| Chair: Dave Musicant |
| SVM Selective Sampling for Ranking with Application to Data Retrieval. Hwanjo Yu |
| Rule Extraction from Hyperplane-based Classifiers. Glenn Fung, Sathyakama Sandilya, Bharat Rao |
| Nomograms for Visualizing Support Vector Machines. Aleks Jakulin, Martin Mozina, Janez Demsar, Ivan Bratko, Blaz Zupan |
| |
| 2:00pm-3:30pm |
Panel |
Crystal B |
| Moderator: Prabhakar Raghavan, Yahoo! Research |
| Title: Text mining the discipline that never was.
Panelists:
Andrei Broder, IBM
Natalie Glance, Intelliseek
Jon Kleinberg, Cornell
Abstract: Hundreds of papers later, we are still unable to define just what text mining
is. Is there a definitive, valuable discipline here with firm scientific
foundations? Or is it too nascent to tell? Or is it just a special case of
structured data mining? Is it just IR re-invented or is there something new
here?
Join our panelists in debating this audience-interactive panel. |
| |
| 3:30pm-4:00pm |
Coffee Break |
Regency Foyer |
| |
| 4:00pm-6:00pm |
Research Track Session 10 [Clustering and Grouping] |
Regency A |
| Chair: Wei Wang |
| Non-Redundant Clustering with Conditional Ensembles. David Gondek, Thomas Hofmann |
| Cross-Relational Clustering with User's Guidance. Xiaoxin Yin, Jiawei Han, Philip Yu |
| Sampling-Based Sequential Subgroup Mining. Martin Scholz |
| Simple and Effective Visual Models for Gene Expression Cancer Diagnostics. Gregor Leban, Minca Mramor, Ivan Bratko, Blaz Zupan |
| |
| 4:00pm-6:00pm |
Industrial/Govt Track Session 3 [Anomaly Detection] |
Regency B |
| Chair: Valery A. Petrushin |
| Dynamic Syslog Mining for Network Failure Monitoring. Kenji Yamanishi, Yuko Maruyama |
| Learning to Predict Train Wheel Failures. Chunsheng Yang, Sylvain Letourneau |
| Using Relational Knowledge Discovery to Prevent Securities Fraud. Jennifer Neville, Ozgur Simsek, David Jensen, John Komoroske, Kelly Palmer, Henry Goldberg |
| An Approach to Spacecraft Anomaly Detection Problem Using Kernel Feature Space. Ryohei Fujimaki, Takehisa Yairi, Kazuo MACHIDA |
| |
| 4:00pm-6:00pm |
Research Track Session 11 [Text and Web Mining] |
Plaza A/B |
| Chair: Dmitry Pavlov |
| The Predictive Power of Online Chatter. Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak, Andrew Tomkins |
| Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. Qiaozhu Mei, ChengXiang |
| Variable Latent Semantic Indexing. Anirban Dasgupta, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins |
| Web Object Indexing Using Domain Knowledge. Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying MA, Naiyao Zhang |
| |
| 6:00pm-6:45 |
KDD Transfer Meeting (organizing committee only) |
Regency A |
| |
| 7:15pm-10:30pm |
Program Committee and Organizing Committee Dinner (by invitation only) See Peter Caron for details. |
|
Wednesday, August 24:
| |
| 7:30am-8:30am |
Continental Breakfast-sponsored by Teradata |
Regency Foyer |
| |
| 8:30am-9:30am |
Plenary Panel: |
Regency C&D |
| Moderator: George John, Yahoo! |
| Selling Vitamins Instead of Aspirin: the Data Mining Adoption Challenge |
| Ultimately computer scientists trade in bits. Some build boxes that
take in bits, remember them, then give them back when asked, maybe
adding up a few numbers in the process. This is the database and
enterprise applications business, $100B/year in revenues. Other
computer scientists build boxes that take in bits on one end and shoot
them out the other end. This is the networking business, also
$100B/year.
As data miners, we build boxes that take in bits, perform magical
computations, and create models that can actually predict future
behavior and events in a way that allows a business to significantly
grow revenues or reduce costs, or we discover structure or patterns that
allow knowledge workers or scientists to make more rapid progress
towards significant discoveries.
So why isn't KDD also a $100B business? Where is our Bill Gates, our
Larry Ellison, our Cisco, our SAP? Does Usama Fayyad have a house with
a trampoline in his 13th bedroom? Does Jim Goodnight race in the
America's cup?
Did you take a vitamin today? The last time you had a bad headache, did
you take an aspirin?
At the Vitamins vs Aspirin panel, representatives from Fortune 500
companies will give their views on prioritizing investments in data
mining, representatives from data mining companies will describe the ups
and downs of corporate adoption, and we will get to the bottom of how to
make sure everyone takes their vitamins.
Usama Fayyad, Yahoo!
Robert Grossman, Open Data Partners and University of Illinois at Chicago
Ronny Kohavi, Microsoft |
| |
| 9:30am-10:30am |
Invited Talk |
Regency C&D |
| Session Chair: Christos Faloutsos The architecture of complexity: The structure and the dynamics of networks from the web to the cell. Albert-L�szl� Barab�si
Abstract: Networks with complex topology describe systems as diverse as the cell, the World Wide Web or the society. The emergence of most networks is driven by self-organizing processes that are governed by simple but generic laws. The analysis of the cellular network of various organisms shows that cells and complex man-made networks, such as the Internet or the world wide web, and many social and collaboration networks share the same large-scale topology. I will show that the scale-free topology of these complex webs have important consequences on their robustness against failures and attacks, with implications on drug design, the Internet's ability to survive attacks and failures, and the ability of ideas and innovations to spread on the network. |
| |
| 10:30am-11:00am |
Coffee Break |
Regency Foyer |
| |
| 11:00am-12:30pm |
Industrial/Govt Track Session 4 [Document Analysis] |
Regency A |
| Chair: Gabor Melli |
| Finding Similar Files in Large Document Repositories. George Forman, Kave Eshghi, Stephane Chiocchetti |
| Making Holistic Schema Matching Robust: An Ensemble Approach. Bin He, Kevin Chen-Chuan Chang |
| Deriving Marketing Intelligence from Online Discussion. Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi Tomokiyo |
| |
| 11:00am-12:30pm |
Research Track Session 12 [Associations] |
Regency B |
| Chair: Bing Liu |
| Reasoning about Sets using Rediscription Mining. Mohammed Zaki, Naren Ramakrishnan |
| Improving Discriminative Sequential Learning with Rare-but Important Associations. Phan Xuan-Hieu, Nguyen Le-Minh, Ho Tu-Bao, Horiguchi Susumu |
| A Multiple Tree Algorithm for the Efficient Association of Asteroid Observations. Jeremy Kubica, Andrew Moore, Andrew Connolly, Robert Jedicke |
| |
| 11:00am-12:30pm |
Research Track Session 13 [Novel Learning Algorithms] |
Plaza A/B |
| Chair: Dan Simovici |
| Fast Discovery of Unexpected Patterns in Data Relative to a Bayesian Network. Szymon Jaroszewicz, Tobias Scheffer |
| A Bayesian Network Classifier with Inverse Tree Structure for Voxelwise Magnetic Resonance Image Analysis. Rong Chen, Edward Herskovits |
| Mining Images on Semantics via Statistical Learning. Jianping Fan Fan, Mohand-Said Hacid |
|