Full Program

Program Highlights

Invited Talks

  • Incentive Networks
    Prabhakar Raghavan, Yahoo! Research
  • Mining the Internet: The Eighth Wonder of the World
    Gian Fulgoni, comScore
  • The Architecture of Complexity: The structure and the dynamics of networks, from the web to the cell.
    Albert-L�szl� Barab�si, Notre Dame

    Research and Industrial/Government Tracks

    • 40 research papers in 13 sessions
    • 14 industrial/government papers in 3 sessions
    • 36 research posters
    • 11 industrial/government posters
    • 2 panels
    • 4 tutorials:
      • Introduction to Logistic Regression
      • Data Visualization and Mining using the GPU
      • Randomized Algorithms for Matrices and Massive Data Sets
      • Principles and Applications of Probabilistic Learning
    • 9 workshops:
      • Data Mining Methods for Anomaly Detection
      • OSDM 2005: Open Source Data Mining
      • UBDM 2005: Utility-Based Data Mining
      • MRDM 2005: Multi-Relational Data Mining
      • BIOKDD 2005: Data Mining in Bioinformatics
      • DM-SSP 2005: Data Mining Standards, Services, and Platforms
      • WebKDD 2005: Taming Evolving, Expanding and Multi-faceted Web Clickstreams
      • LinkKDD 2005: Link Discovery: Issues, Approaches and Applications
      • Multimedia Data Mining: "Mining Integrated Media and Complex Data"

    Saturday, August 20

    • 5:00pm-9:00pm (Regency Foyer) Registration
    • Summarized Technical Program

    Sunday

    • 4 Tutorials
    • 9 Workshops
    • SIGKDD 2005 Opening
    • Awards Ceremony
    • KDD Cup 2005

    Monday

    • Invited Talk
    • Research Track
      • Temporal Mining (3 papers)
      • Cost Sensitive Learning (2 papers)
      • Privacy (3 papers)
      • Streaming Data (2 papers)
    • Industrial/Government Track
      • E-Commerce (3 papers)
    • 1 Best Research Paper
    • 1 Best Applications Paper
    • 1 Best Student Paper, 1 Runner-up
    • Poster Highlights
    • Poster Session and Reception

    Tuesday

    • Invited Talk
    • Research Track
      • Ensemble Learning (3 papers)
      • Graph Mining (3 papers)
      • Clustering (4 papers)
      • Support Vector Machines (3 papers)
      • Clustering and Grouping (4 papers)
      • Text and Web Mining (4 papers)
    • Industrial/Government Track
      • Sequence Mining (3 papers)
      • Anomaly Detection (4 papers)
    • 1 Panel

    Wednesday

    • 1 Invited Talk
    • 1 Panel
    • Research Track
      • Associations (3 papers)
      • Novel Learning Algorithms (3 papers)
    • Industrial/Government Track
      • Document Analysis (3 papers)

    The Program


    Saturday, August 20:

    5:00pm-9:00pm Registration East Concourse Area

    Sunday, August 21:

    7:30am-8:00pm Registration (ongoing) Regency Foyer
    8:30am-4:30pm Full-Day Workshops
    Data Mining Methods for Anomaly Detection Crystal B
    UBDM 2005: Utility-Based Data Mining Crystal C
    LinkKDD 2005: Link Discovery: Issues, Approaches and Applications Plaza B
    9:00am-4:30pm Full-Day Workshops
    OSDM 2005: Open Source Data Mining Wrigley
    MRDM 2005: Multi-Relational Data Mining Toronto
    BIOKDD 2005: Data Mining in Bioinformatics Crystal A
    DM-SSP 2005: Data Mining Standards, Services, and Platforms Comiskey
    WebKDD 2005: Taming Evolving, Expanding and Multi-faceted Web Clickstreams Plaza A
    Multimedia Data Mining: Mining Integrated Media and Complex Data Acapulco
    9:00am-12:00pm Tutorial Regency A
    Introduction to Logistic Regression. Dave Lewis, David D. Lewis Consulting
    9:00am-12:00pm Tutorial Regency B
    Randomized Algorithms for Matrices and Massive Data Sets. Petros Drineas, Rensselaer Polytechnic Institute Michael W. Mahoney, Yale University
    10:00am-10:30am Coffee Break Regency Foyer
    12:00pm-1:30pm Lunch The Riverside Center West
    1:30pm-4:30pm Tutorial Regency A
    Data Visualization and Mining using the GPU. Sudipto Guha, University of Pennsylvania Shankar Krishnan, AT&T Labs Suresh Venkatasubramanian, AT&T Labs
    1:30pm-4:30pm Tutorial Regency B
    Principles and Applications of Probabilistic Learning. Padhraic Smyth, University of California at Irvine
    3:00pm-3:30pm Coffee Break Regency Foyer
    4:30pm-5:00pm Break
    5:00pm-5:45pm KDD Opening and Awards Crystal Ballroom
    Robert Grossman, General Chair
    Roberto Bayardo, Kristin Bennett, Program Chairs
    Daniela Raicu, Student Awards Chair
    Gregory Piatetsky, SIGKDD Chair
    5:45pm-6:15pm KDD Service Award Presentation Crystal Ballroom
    6:15pm-7:15pm KDD Cup Awards Crystal Ballroom
    Ying Li, Zijian Zheng, KDD Cup Chairs

    Monday, August 22:

    7:30am-5:00pm Registration (ongoing) Regency Foyer
    7:00am-8:30am Continental Breakfast-sponsored by SAS Regency Foyer
    8:30am-10:00am Invited Talk Crystal Ballroom
    Session Chair: Roberto Bayardo

    Incentive Networks
    Prabhakar Raghavan

    Abstract: We propose a notion of incentive networks, modeling online settings in which multiple participants in a network help each other find information. Within this general setting, we study query incentive networks, a natural abstraction of question-answering systems with rewards for finding answers. We analyze strategic behavior in such networks and under a simple model of networks, show that the Nash equilibrium for participants' strategies exhibits an unexpected threshold phenomenon. (Joint work with Jon Kleinberg.)

    9:00am-5:00pm Exhibits Regency C&D
    10:00am-10:30am Coffee Break Regency Foyer
    10:30am-12:00pm Industrial/Govt Track Session 1 [E-Commerce] Regency A
    Chair: Ronny Kohavi
    Price Prediction and Insurance for Online Auctions. Rayid Ghani
    Predicting Product Purchase Patterns of Corporate Customers. Bhavani Raskutti, Alan Herschtal
    Enhancing the Lift Curve Under Budget Constraints: An Application in the Mutual Fund Industry. Lian Yan, Michael Fassino, Patrick Baldasare, Robert Hull
    10:30am-12:00pm Research Track Session 1 [Temporal Mining] Regency B
    Chair: Jian Pei
    Finding Partial Orders from Unordered 0-1 Data. Antti Ukkonen, Mikael Fortelius, Heikki Mannila
    Detection of Emerging Space-Time Clusters. Daniel Neill, Andrew Moore, Maheshkumar Sabhnani, Kenny Daniel
    Probabilistic Workflow Mining. Ricardo Silva, Jiji Zhang, James G. Shanahan
    10:30am-12:00pm Research Track Session 2 [Privacy] Plaza A
    Chair: Ramakrishnan Srikant
    A New Scheme on Privacy-Preserving Data Classification. Nan Zhang, Shengquan Wang, Wei Zhao
    Anonymity-Preserving Data Collection. Zhiqiang Yang, Sheng Zhong, Rebecca N. Wright
    A Distributed Learning Framework Based on Probabilistic Models. Srujana Merugu, Joydeep Ghosh
    12:00pm-1:30pm Lunch-sponsored by Yahoo! Research Labs The Riverside Center West
    1:30pm-2:30pm Research Track Session 3 [Best Student Papers] Regency A
    Chair: Gautam Das
    Query Chains Learning to Rank from Implicit Feedback. Filip Radlinski and Thorsten Joachims
    Summarizing Itemset Patterns: A Profile-Based Approach. Xifeng Yan, Hong Cheng, Dong Xin, and Jiawei Han
    1:30pm-2:30pm Research Track Session 4 [Cost Sensitive Learning] Regency B
    Chair: Marko Grobelnik
    Local Sparsity Control for Na�ve Bayes with Extreme Misclassification Costs. Aleksander Kolcz
    Combining Email Models for False Positive Reduction. Shlomo Hershkop, Salvatore Stolfo
    1:30pm-2:30pm Research Track Session 5 [Streaming Data] Plaza A
    Chair: Petros Drineas
    Streaming Feature Selection Using Alpha Investing. Jing Zhou, Dean Foster, Robert Stine, and Lyle Ungar
    Wavelet Synopsis for Data Streams: Minimizing non-Euclidean Error. Sudipto Guha and Boulos Harb
    2:30pm-3:30pm Paper Award Talks Best Paper Award Crystal Ballroom
    Session Chair: Kristin Bennett
    BEST RESEARCH PAPER AWARD
    Graphs Over Time: Densification Laws, Shrinking Diameters, and Possible Explanations. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos
    BEST APPLICATIONS PAPER AWARD
    A Hit-Miss Model for Duplicate Detection in the WHO Drug Safety Database. G. Niklas Noren, Roland Orre and Andrew Bate
    3:30pm-4:15pm Plenary Poster Presentations Crystal Ballroom
    4:15pm-4:45pm Coffee Break Regency Foyer
    4:45pm-5:45pm Plenary Poster Presentations Crystal Ballroom
    6:15pm-7:00pm Buses begin leaving for Field Museum
    7:00pm-10:00pm Poster Reception-sponsored by Fair Isaac Field Museum
    Note: Attendees will be able to visit all Field Museum areas except special exhibits.

    Tuesday, August 23:

    7:30am-5:00pm Registration (ongoing) Regency Foyer
    7:00am-8:30am Continental Breakfast-sponsored by SPSS Regency Foyer
    8:30am-10:00am Invited Talk Crystal Ballroom
    Session Chair: Robert Grossman

    Mining the Internet: The Eighth Wonder of the World
    Gian Fulgoni

    Abstract: The Internet takes behavioral consumer research to a new level by providing the ability to passively and continuously monitor the complete online behavior of millions of consumers in an opt-in, privacy protected manner. Imagine the analytical possibilities if every site visited, every page viewed, content seen, transaction conducted ..... all of this granularity in behavior --- was continuously captured with explicit consumer permission for millions of consumers and privacy was protected. What unique insights could one gain into consumers' behavior, their interests, passions and lifestyles? What behavior could be predicted? What commercial applications would be possible?

    9:00am-5:00pm Exhibits Regency C&D
    10:00am-10:30am Coffee Break Regency Foyer
    10:30am-12:00pm Industrial/Govt Track Session 2 [Sequence Mining] Regency B
    Chair: Myra Spiliopoulou
    Exploiting Retrieval Measures in the Early Stages of Mining Evolving Web Clickstreams. Olfa Nasraoui, Cesar Cardona, Carlos Rojas
    Email Data Cleaning. Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
    Modeling and Predicting Personal Information Dissemination Behavior. Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, Ming-Ting Sun
    10:30am-12:00pm Research Track Session 6 [Ensemble Learning] Regency A
    Chair: Jennifer Dy
    Robust Boosting and its relation to bagging. Saharon Rosset
    Feature Bagging for Outlier Detection. Aleksandar Lazarevic, Vipin Kumar
    Combining Partitions by Probabilistic Label Aggregation. Tilman Lange, Joachim Buhmann
    10:30am-12:00pm Research Track Session 7 [Graph Mining] Plaza A/B
    Chair: Tina Eliassi-Rad
    Mining Tree queries in a graph. Bart Goethals, Eveline Hoekx, Jan Van den Bussche
    On Mining Cross-Graph Quasi-Cliques. Jian Pei, Daxin Jiang, Aidong Zhang
    Mining Closed Relational Graphs with Connectivity Constraints. Xifeng Yan, X. Jasmine Zhou, Jiawei Han
    12:00pm-2:00pm SIGKDD Business Lunch-sponsored by Microsoft SQL Server The Riverside Center West
    2:00pm-4:00pm Research Track Session 8 [Clustering] Regency A
    Chair: Sugato Basu
    Dimension Induced Clustering. Aris Gionis, Alexander Hinneburg, Spiros Papadimitriou, Panayiotis Tsaparas
    On the Use of Linear Programming for Unsupervised Text Classification. Mark Sandler
    A General Model for Clustering Binary Data. Tao Li
    Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering. Bin GAO, Tie-Yan LIU, Xin Zheng, Qian-sheng Chen, Wei-Ying MA
    2:00pm-3:30pm Research Track Session 9 [Support Vector Machines] Regency B
    Chair: Dave Musicant
    SVM Selective Sampling for Ranking with Application to Data Retrieval. Hwanjo Yu
    Rule Extraction from Hyperplane-based Classifiers. Glenn Fung, Sathyakama Sandilya, Bharat Rao
    Nomograms for Visualizing Support Vector Machines. Aleks Jakulin, Martin Mozina, Janez Demsar, Ivan Bratko, Blaz Zupan
    2:00pm-3:30pm Panel Crystal B
    Moderator: Prabhakar Raghavan, Yahoo! Research
    Title: Text mining the discipline that never was.

    Panelists:
    Andrei Broder, IBM
    Natalie Glance, Intelliseek
    Jon Kleinberg, Cornell

    Abstract: Hundreds of papers later, we are still unable to define just what text mining is. Is there a definitive, valuable discipline here with firm scientific foundations? Or is it too nascent to tell? Or is it just a special case of structured data mining? Is it just IR re-invented or is there something new here?

    Join our panelists in debating this audience-interactive panel.

    3:30pm-4:00pm Coffee Break Regency Foyer
    4:00pm-6:00pm Research Track Session 10 [Clustering and Grouping] Regency A
    Chair: Wei Wang
    Non-Redundant Clustering with Conditional Ensembles. David Gondek, Thomas Hofmann
    Cross-Relational Clustering with User's Guidance. Xiaoxin Yin, Jiawei Han, Philip Yu
    Sampling-Based Sequential Subgroup Mining. Martin Scholz
    Simple and Effective Visual Models for Gene Expression Cancer Diagnostics. Gregor Leban, Minca Mramor, Ivan Bratko, Blaz Zupan
    4:00pm-6:00pm Industrial/Govt Track Session 3 [Anomaly Detection] Regency B
    Chair: Valery A. Petrushin
    Dynamic Syslog Mining for Network Failure Monitoring. Kenji Yamanishi, Yuko Maruyama
    Learning to Predict Train Wheel Failures. Chunsheng Yang, Sylvain Letourneau
    Using Relational Knowledge Discovery to Prevent Securities Fraud. Jennifer Neville, Ozgur Simsek, David Jensen, John Komoroske, Kelly Palmer, Henry Goldberg
    An Approach to Spacecraft Anomaly Detection Problem Using Kernel Feature Space. Ryohei Fujimaki, Takehisa Yairi, Kazuo MACHIDA
    4:00pm-6:00pm Research Track Session 11 [Text and Web Mining] Plaza A/B
    Chair: Dmitry Pavlov
    The Predictive Power of Online Chatter. Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak, Andrew Tomkins
    Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. Qiaozhu Mei, ChengXiang
    Variable Latent Semantic Indexing. Anirban Dasgupta, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins
    Web Object Indexing Using Domain Knowledge. Muyuan Wang, Zhiwei Li, Lie Lu, Wei-Ying MA, Naiyao Zhang
    6:00pm-6:45 KDD Transfer Meeting (organizing committee only) Regency A
    7:15pm-10:30pm Program Committee and Organizing Committee Dinner (by invitation only) See Peter Caron for details.

    Wednesday, August 24:

    7:30am-8:30am Continental Breakfast-sponsored by Teradata Regency Foyer
    8:30am-9:30am Plenary Panel: Regency C&D
    Moderator: George John, Yahoo!
    Selling Vitamins Instead of Aspirin: the Data Mining Adoption Challenge
    Ultimately computer scientists trade in bits. Some build boxes that take in bits, remember them, then give them back when asked, maybe adding up a few numbers in the process. This is the database and enterprise applications business, $100B/year in revenues. Other computer scientists build boxes that take in bits on one end and shoot them out the other end. This is the networking business, also $100B/year.

    As data miners, we build boxes that take in bits, perform magical computations, and create models that can actually predict future behavior and events in a way that allows a business to significantly grow revenues or reduce costs, or we discover structure or patterns that allow knowledge workers or scientists to make more rapid progress towards significant discoveries.

    So why isn't KDD also a $100B business? Where is our Bill Gates, our Larry Ellison, our Cisco, our SAP? Does Usama Fayyad have a house with a trampoline in his 13th bedroom? Does Jim Goodnight race in the America's cup?

    Did you take a vitamin today? The last time you had a bad headache, did you take an aspirin?

    At the Vitamins vs Aspirin panel, representatives from Fortune 500 companies will give their views on prioritizing investments in data mining, representatives from data mining companies will describe the ups and downs of corporate adoption, and we will get to the bottom of how to make sure everyone takes their vitamins.

    Usama Fayyad, Yahoo!
    Robert Grossman, Open Data Partners and University of Illinois at Chicago
    Ronny Kohavi, Microsoft

    9:30am-10:30am Invited Talk Regency C&D
    Session Chair: Christos Faloutsos

    The architecture of complexity: The structure and the dynamics of networks from the web to the cell.
    Albert-L�szl� Barab�si

    Abstract: Networks with complex topology describe systems as diverse as the cell, the World Wide Web or the society. The emergence of most networks is driven by self-organizing processes that are governed by simple but generic laws. The analysis of the cellular network of various organisms shows that cells and complex man-made networks, such as the Internet or the world wide web, and many social and collaboration networks share the same large-scale topology. I will show that the scale-free topology of these complex webs have important consequences on their robustness against failures and attacks, with implications on drug design, the Internet's ability to survive attacks and failures, and the ability of ideas and innovations to spread on the network.

    10:30am-11:00am Coffee Break Regency Foyer
    11:00am-12:30pm Industrial/Govt Track Session 4 [Document Analysis] Regency A
    Chair: Gabor Melli
    Finding Similar Files in Large Document Repositories. George Forman, Kave Eshghi, Stephane Chiocchetti
    Making Holistic Schema Matching Robust: An Ensemble Approach. Bin He, Kevin Chen-Chuan Chang
    Deriving Marketing Intelligence from Online Discussion. Natalie Glance, Matthew Hurst, Kamal Nigam, Matthew Siegler, Robert Stockton, Takashi Tomokiyo
    11:00am-12:30pm Research Track Session 12 [Associations] Regency B
    Chair: Bing Liu
    Reasoning about Sets using Rediscription Mining. Mohammed Zaki, Naren Ramakrishnan
    Improving Discriminative Sequential Learning with Rare-but Important Associations. Phan Xuan-Hieu, Nguyen Le-Minh, Ho Tu-Bao, Horiguchi Susumu
    A Multiple Tree Algorithm for the Efficient Association of Asteroid Observations. Jeremy Kubica, Andrew Moore, Andrew Connolly, Robert Jedicke
    11:00am-12:30pm Research Track Session 13 [Novel Learning Algorithms] Plaza A/B
    Chair: Dan Simovici
    Fast Discovery of Unexpected Patterns in Data Relative to a Bayesian Network. Szymon Jaroszewicz, Tobias Scheffer
    A Bayesian Network Classifier with Inverse Tree Structure for Voxelwise Magnetic Resonance Image Analysis. Rong Chen, Edward Herskovits
    Mining Images on Semantics via Statistical Learning. Jianping Fan Fan, Mohand-Said Hacid
    Webmaster: Michal Sabala