Program Information


Saturday, August 11th
5:00 pm - 9:00 pmRegistration
Sunday, August 12th
7:30 am - 8:00 pmRegistration (all day)
9:00 am - 5:00 pmFull Day Workshops (KDD Cup Workshop will start at 8:45 am)
9:00 am - 12:00 pmMorning Half Day Workshops
9:00 am - 12:00 pmMorning Tutorials
10:00 am - 10:30 amCoffee Break
12:00 pm - 2:00 pmLunch (on your own)
2:00 pm - 5:00 pmAfternoon Half Day Workshops
2:00 pm - 5:00 pmAfternoon Tutorials
3:00 pm - 3:30 pmCoffee Break
6:00 pm - 6:15 pmOpening Remarks by Pavel Berkhin, Rich Caruana and Xindong Wu
6:15 pm - 6:45 pmACM SIGKDD Award Presentations:
* KDD-07 Best Paper Awards, Thorsten Joachims
* Student Travel Awards, Kamal Nigam
* KDD Cup Winners, Bing Liu
* SIGKDD Service and Innovation Awards, Ramasamy Uthurusamy
6:45 pm - 7:30 pmACM SIGKDD Innovation Award Talk by Usama Fayyad
Monday, August 13th
7:30 am - 8:00 pmRegistration (all day)
7:30 am - 5:30 pmExhibits
7:30 am - 9:00 amContinental Breakfast
9:00 am - 10:00 amInvited Talk #1 - Jon Kleinberg, Cornell University
10:00 am - 10:30 amCoffee Break
30 min break
10:30 am - 12:30 pmResearch Sessions R1, R2, R3
6 papers per session (120 mins)
10:30 am - 12:30 pmIndustry Session I1
12:30 pm - 2:00 pmLunch
90 min lunch
2:00 pm - 3:20 pmResearch Sessions R4, R5, R6
4 papers per session (80 mins)
2:00 pm - 3:20 pmIndustry Session I2
3:20 pm - 3:50 pmBreak
30 min break
3:50 pm - 5:10 pmResearch Sessions R7, R8, R9
4 papers per session (80 mins)
3:50 pm - 5:10 pmIndustry Session I3
main sessions end 5:10pm
6:15 pm - 9:15 pmPoster Reception and Demonstration Session at the San Jose Museum of Art
Tuesday, August 14th
7:30 am - 5:00 pmRegistration (all day)
7:30 am - 8:00 pmExhibits
7:30 am - 9:00 amContinental Breakfast
9:00 am - 10:00 amInvited talk #2 - Usama Fayyad, Yahoo!
10:00 am - 10:30 amCoffee Break
30 min break
10:30 am - 11:50 amResearch Sessions R10, R11, R12
4 papers per session (80 mins)
10:30 am - 11:50 amIndustry Session I4
12:00 pm - 2:00 pmSIGKDD Business Lunch
120 min lunch
2:00 pm - 3:20 pmResearch Sessions R13, R14, R15
4 papers per session (80 mins)
2:00 pm - 3:20 pmIndustry Session I5
3:20 pm - 3:50 pmCoffee Break
30 min break
3:50 pm - 5:10 pmResearch Sessions R16, R17, R18
4 papers per session (80 mins)
3:50 pm - 5:10 pmBirds of Feather (BOF)
main sessions end 5:10pm
5:15 pm - 6:15 pmKDD Transfer Meeting (KDD-07 and KDD-08 organizers)
5:40 pm - 8:00 pmSecond Poster Reception, Fairmont Hotel
8:00 pm - 10:00 pmProgram Committee and Organizing Committee Dinner (by invitation only)
Wednesday, August 15th
7:30 am - 9:00 amContinental Breakfast
9:00 am - 10:40 amResearch Sessions R19, R20, R21
5 papers per session (100 mins)
9:00 am - 10:40 amPanel Discussion
10:40 am - 11:15 amCoffee Break
35 min break
11:15 am - 12:15 pmInvited talk #3 - Chris Anderson, Wired Magazine
12:15 pm - 12:30 pmClosing Remarks by Pavel Berkhin


Jon Kleinberg, Cornell University
Monday 9:00 am ~ 10:00 am, Imperial

  • Challenges in Social Network Data: Processes, Privacy and Paradoxes

    The proliferation of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences.

    We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication, community formation, information-seeking and collective problem-solving, marketing, the spread of news, and the dynamics of popularity. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages, the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations.

    The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization � the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed.

    In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information, these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining can inform how we think about the protection of even the most skeletal social network data.

    Usama Fayyad, Yahoo!
    Tuesday 9:00 am ~ 10:00 am, Imperial

  • From Mining the Web to Inventing the New Sciences Underlying the Internet

    As the Internet continues to change the way we live, find information, communicate, and do business, it has also been taking on a dramatically increasing role in marketing and advertising. Unlike any prior mass medium, the Internet is a unique medium when it comes to interactivity and offers ability to target and program messaging at the individual level. Coupled with its uniqueness in the richness of the data that is available for measurability, in the variety of ways to utilize the data, and in the great dependence of effective marketing on applications that are heavily data-driven, makes data mining and statistical data analysis, modeling, and reporting an essential mission-critical part of running the on-line business.

    However, because of its novelty and the scale of data sets involved, few companies have figured out how to properly make use of this data. In this talk, I will review some of the challenges and opportunities in the utilization of data to drive this new generation of marketing systems. I will provide several examples of how data is utilized in critical ways to drive some of these capabilities. The discussion will be framed with the More general framework of Grand Challenges for data mining: pragmatic and technical.

    I will conclude this presentation with a consideration of the larger issues surrounding the Internet as a technology that is ubiquitous in our lives, yet one where very little is understood, at the scientific level, in defining and understanding many of the basics the Internet enables: Community, Personalization, and the new Microeconomics of the web. This leads to an overview of the new Yahoo! Research organization and its aims: inventing the new sciences underlying what we do on the Internet, focusing on areas that have received little attention in the traditional academic circles. Some illustrative examples will be reviewed to make the ultimate goals more concrete.

    Chris Anderson, Wired Magazine
    Wednesday 11:15 am ~ 12:15 pm, Imperial

  • Calculating Latent Demand in the Long Tail

    He is the author of New York Times bestselling book The Long Tail: Why the Future of Business is Selling Less of More, which as published in 2006, and runs a blog on the subject at longtail.com. In 2007 he was named one of the �Time 100,� the newsmagazine�s list of the 100 men and women whose power, talent or moral example is transforming the world.

    Previously, he was at The Economist, where he served as U.S. Business Editor, Asia Business Editor (based in Hong Kong); and Technology Editor. He started The Economist�s Internet coverage in 1994 and directed its initial web strategy. Mr. Anderson's media career began at the two premier science journals, Nature and Science, where he served in several editorial capacities. Prior to that he worked as a researcher at Los Alamos National Laboratory�s meson physics facility and served as research assistant to the Chief Scientist of the Department of Transportation. He holds a Bachelor of Science degree in Physics from George Washington University and studied Quantum Mechanics and Science Journalism at the University of California at Berkeley.


    R1: Web/Text Mining (I)

    Monday 10:30 am ~ 12:30 pm, Regency 1

  • Kr676 | Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases | Benyah Shaparenko and Thorsten Joachims
  • Kr497 | Upping the Baseline for High-Precision Text Classifiers | Aleksander Kolcz and Wen-Tau Yih
  • Kr567 | Extracting Semantic Relations from Query Logs | Ricardo Baeza-Yates and Alessandro Tiberi
  • Kr722 | Multiscale Topic Tomography | Ramesh Nallapati, William W. Cohen, Susan Ditmore, John Lafferty, and Kin Ung
  • Kr734 | A Concept-based Model for Enhancing Text Categorization | Shady Shehata, Fakhri Karray, and Mohamed Kamel
  • Kr806 | Expertise modeling for matching papers with reviewers | David Mimno and Andrew McCallum

    R2: Graph Mining and Social Networks

    Monday 10:30 am ~ 12:30 pm, Regency 2

  • Kr710 | Fast Direction-Aware Proximity for Graph Mining | Hanghang Tong, Yehuda Koren, and Christos Faloutsos
  • Kr346 | Correlation Search in Graph Databases | Yiping Ke, James Cheng, and Wilfred Ng
  • Kr431 | SCAN: A Structural Clustering Algorithm for Networks | Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger
  • Kr652 | A Framework For Community Identification in Dynamic Social Networks | Chayant Tantipathananandh, Tanya Y. Berger-Wolf, and David Kempe
  • Kr700 | Fast Best-Effort Pattern Matching in Large Attributed Graphs | Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad
  • Kr781 | Temporal Causal Modeling with Graphical Granger Methods | Andrew Arnold, Yan Liu, and Naoki Abe

    R3: Filtering and Ranking

    Monday 10:30 am ~ 12:30 pm, Crystal

  • Kr288 | Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing | Seung-Taek Park and David Pennock
  • Kr414 | Use of Ranked Cross Document Evidence Trails for Hypothesis Generation | Rohini Srihari, Li Xu, and Tushar Saxena
  • Kr791 | A Learning Framework using Green's Function and Kernel Regularization with Application to Recommender System | Chris Ding, Rong Jin, Tao Li, and Horst Simon
  • Kr679 | Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems | Robert Bell, Yehuda Koren, and Chris Volinsky
  • Kr827 | Real-time Ranking with Concept Drift Using Expert Advice | Hila Becker and Marta Arias
  • Kr702 | Active Exploration for Learning Rankings from Clickthrough Data | Filip Radlinski and Thorsten Joachims

    R4: Web/Text Mining (II)

    Monday 2pm ~ 3:20pm, Regency 1

  • Kr812 | Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brain Wave Ontologies | Dejing Dou, Gwen Frishkoff, Jiawei Rong, Robert Frank, Allen Malony, and Don Tucker
  • Kr301 | Exploiting Duality in Summarization with Deterministic Guarantees | Panagiotis Karras, Dimitris Sacharidis, and Nikos Mamoulis
  • Kr465 | Webpage Understanding: an Integrated Approach | Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Hsiao-Wuen Hon
  • Kr490 | Show me the money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews | Nikolay Archak, Anindya Ghose, and Panagiotis Ipeirotis

    R5: Classification (I)

    Monday 2pm ~ 3:20pm, Regency 2

  • Kr259 | Support Feature Machine for Classification of Abnormal Brain Activity | W. Art Chaovalitwongse, Ya-Ju Fan, and Rajesh Sachdeo
  • Kr641 | Automatic Labeling of Multinomial Topic Models | Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai
  • Kr348 | Mining Statistically Important Equivalence Classes | Jinyan Li, Guimei Liu, and Limsoon Wong
  • Kr504 | Local Decomposition for Rare Class Analysis | Junjie Wu, Hui Xiong, Peng Wu, and Jian Chen

    R6: Clustering (I)

    Monday 2pm ~ 3:20pm, Crystal

  • Kr218 | The Minimum Consistent Subset Cover Problem and its Applications in Data Mining | Byron J. Gao, Martin Ester, Jin-Yi Cai, Oliver Schulte, and Hui Xiong
  • Kr335 | Co-clustering based Classification for Out-of-domain Documents | Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu
  • Kr448 | Efficient Incremental Clustering with Constraints | Ian Davidson, S.S. Ravi, and Martin Ester
  • Kr565 | A Probabilistic Framework for Relational Clustering | Bo Long, Zhongfei Zhang, and Philip S. Yu

    R7: Web/Text Mining (III)

    Monday 3:50 pm ~ 5:10 pm, Regency 1

  • Kr555 | Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior | Issei Sato and Hiroshi Nakagawa
  • Kr689 | Tracking Multiple Topics for Finding Interesting Articles | Raymond Pon, Alfonso Cardenas, David Buttler, and Terence Critchlow
  • Kr693 | Efficient and Effective Explanation of Change in Hierarchical Summaries | Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Flip Korn, Divesh Srivastava, and Neal Young
  • Kr712 | Content-based Document Routing and Index Partitioning for Scalable Similarity-based Searches in a Large Corpus | Deepavali Bhagwat, Kave Eshghi, and Pankaj Mehra

    R8: Pattern Discovery (I)

    Monday 3:50 pm ~ 5:10 pm, Regency 2

  • Kr276 | Trajectory Pattern Mining | Fosca Giannotti, Mirco Nanni, Dino Pedreschi, and Fabio Pinelli
  • Kr374 | Finding low-entropy sets and trees from binary data | Hannes Heikinheimo, Eino Hinkkanen, Heikki Mannila, Taneli Mielikinen, and Jouni Seppnen
  • Kr322 | Detecting Motifs Under Uniform Scaling | Dragomir Yankov, Eamonn Keogh, Jose Medina, Bill Chiu, and Victor Zordan
  • Kr502 | Mining Favorable Facets | Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, and Ke Wang

    R9: Clustering (II)

    Monday 3:50 pm ~ 5:10 pm, Crystal

  • Kr777 | Evolutionary Spectral Clustering by Incorporating Temporal Smoothness | Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle Tseng
  • Kr205 | Using Hierarchical Clustering for Learning | Vincent Schickel and Boi Faltings
  • Kr405 | Nestedness and segmented nestedness | Heikki Mannila and Evimaria Terzi
  • Kr412 | XProj: A Framework for Projected Structural Clustering of XML Documents | Charu Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, and Mohammed Zaki

    R10: Web/Text Mining (IV)

    Tuesday 10:30 am ~ 11:50 am, Regency 1

  • Kr751 | Detecting research topics via the correlation between graphs and texts | Yookyung Jo, Carl Lagoze, and C. Lee Giles
  • Kr756 | Generalized Component Analysis for Text with Heterogeneous Attributes | Xuerui Wang, Chris Pal, and Andrew McCallum
  • Kr792 | Feature Selection Methods for Text Classification | Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, and Michael Mahoney
  • Kr452 | Cross-language information retrieval using PARAFAC2 | Peter Chew, Brett Bader, Tamara Kolda, and Ahmed Abdelali

    R11: Pattern Discovery (II)

    Tuesday 10:30 am ~ 11:50 am, Regency 2

  • Kr556 | Efficient Mining of Iterative Patterns for Software Specification Discovery | David Lo, Siau-Cheng Khoo, and Chao Liu
  • Kr605 | From frequent itemsets to semantically meaningful visual patterns | Junsong Yuan, Ying Wu, and Ming Yang
  • Kr673 | Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns | Lisa Friedland and David Jensen
  • Kr793 | Association Analysis-based Transformations for Protein Interaction Networks: A Function Prediction Case Study | Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, and Vipin Kumar

    R12: Anomaly/Template Detection

    Tuesday 10:30 am ~ 11:50 am, Crystal

  • Kr447 | Weighting versus Pruning in Rule Validation for Detecting Network and Host Anomalies | Gaurav Tandon and Philip Chan
  • Kr449 | Cost-effective Outbreak Detection in Networks | Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance
  • Kr204 | Joint Optimization of Wrapper Generation and Template Detection | Shuyi Zheng, Ruihua Song, Di Wu, and Ji-Rong Wen
  • Kr333 | Detecting Anomalous Records in Categorical Datasets | Kaustav Das and Jeff Schneider

    R13: Web/Text Mining (V)

    Tuesday 2 pm ~ 3:20 pm, Regency 1

  • Kr291 | Mining Correlated Bursty Topic Patterns from Coordinated Text Streams | Xuanhui Wang, ChengXiang Zhai, Xiao Hu, and Richard Sproat
  • Kr551 | Mining Templates from Search Result Records of Search Engines | Hongkun Zhao, Weiyi Meng, and Clement Yu
  • Kr607 | Exploiting Underrepresented Query Aspects for Automatic Query Expansion | Daniel Crabtree, Peter Andreae, and Xiaoying Gao
  • Kr790 | Canonicalization of Database Records using Adaptive Similarity Measures | Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, and Andrew McCallum

    R14: Statistical Methods (I)

    Tuesday 2 pm ~ 3:20 pm, Regency 2

  • Kr352 | Hierarchical Mixture Models: a Probabilistic Analysis | Mark Sandler
  • Kr398 | Information distance from a question to an answer | Xian Zhang, Yu Hao, Xiaoyan Zhu, and Ming Li
  • Kr420 | Statistical Change Detection for Multi-Dimensional Data | Xiuyao Song, Mingxi Wu, Chris Jermaine, and Sanjay Ranka
  • Kr424 | Learning the Kernel Matrix in Discriminant Analysis via Quadratically Constrained Quadratic Programming | Jieping Ye, Shuiwang Ji, and Jianhui Chen

    R15: Clustering (III)

    Tuesday 2 pm ~ 3:20 pm, Crystal

  • Kr434 | Constraint-Driven Clustering | Rong Ge, Martin Ester, Wen Jin, and Ian Davidson
  • Kr507 | A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network | Motoki Shiga, Ichigaku Takigawa, and Hiroshi Mamitsuka
  • Kr520 | Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis | Frizo Janssens, Wolfgang Glnzel, and Bart De Moor
  • Kr606 | Enhancing Semi-Supervised Clustering: A Feature Projection Perspective | Wei Tang, Hui Xiong, Shi Zhong, and Jie Wu

    R16: Mining Data Streams

    Tuesday 3:50 pm ~ 5:10 pm, Regency 1

  • Kr270 | Density-Based Clustering of Real-Time Stream Data | Yixin Chen and Li Tu
  • Kr430 | On String Classification in Data Streams | Charu Aggarwal and Philip S. Yu
  • Kr305 | A fast algorithm for finding frequent episodes in event streams | Srivatsan Laxman, Sastry P. S., and Unnikrishnan K. P.
  • Kr660 | Practical Learning from One Sided Feedback | D. Sculley

    R17: Statistical Methods (II)

    Tuesday 3:50 pm ~ 5:10 pm, Regency 2

  • Kr437 | Scalable Look-Ahead Linear Regression Trees | David Vogel, Ognian Asparouhov, and Tobias Scheffer
  • Kr475 | Estimating Rates of Rare Events at Multiple Resolutions | Deepak Agarwal, Andrei Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovski, and Mayssam Sayyadian
  • Kr717 | Predictive Discrete Latent Factor Models for Large Scale Dyadic Data | Deepak Agarwal and Srujana Merugu
  • Kr491 | A Scalable Modular Convex Solver for Regularized Risk Minimization | Quoc Le, Alex Smola, Choon Hui Teo, and Vishwanathan S V N

    R18: Clustering (IV)

    Tuesday 3:50 pm ~ 5:10 pm, Crystal

  • Kr664 | BoostCluster: Boosting Clustering by Pairwise Constraints | Yi Liu, Rong Jin, Anil Jain, and Pavan Mallapragada
  • Kr683 | Nonlinear Adaptive Distance Metric Learning for Clustering | Jianhui Chen, Zheng Zhao, Jieping Ye, and Huan Liu
  • Kr704 | A Framework for Simultaneous Co-clustering and Learning from Complex Data | Meghana Deodhar and Joydeep Ghosh
  • Kr773 | Joint Cluster Analysis of Attribute and Relationship Data Without Priori Specification of the Number of Clusters | Flavia Moser, Rong Ge, and Martin Ester

    R19: Temporal Data Mining

    Wednesday 9:00 am ~ 10:40 am, Regency 1

  • Kr560 | Stochastic Processes and Temporal Data Mining | Paul Cotofrei and Kilian Stoffel
  • Kr570 | Characterising the Difference | Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes
  • Kr600 | Structural and Temporal Analysis of the Blogosphere Through Community Factorization | Yun Chi, Shenghuo Zhu, Xiaodan Song, Junichi Tatemura, and Belle Tseng
  • Kr627 | Time-Dependent Event Hierarchy Construction | Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, and Philip S. Yu
  • Kr687 | GraphScope: Parameter-free Mining of Large Time-evolving Graphs | Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos

    R20: Classification (II)

    Wednesday 9:00 am ~ 10:40 am, Regency 2

  • Kr546 | Mining Optimal Decision Trees from Itemset Lattices | Siegfried Nijssen and Elisa Fromont
  • Kr614 | Partial Example Acquisition in Cost-Sensitive Learning | Victor S. Sheng and Charles X. Ling
  • Kr778 | Model-Shared Subspace Boosting for Multi-label Classification | Rong Yan, Jelena Tesic, and John Smith
  • Kr848 | Semi-Supervised Classification with Hybrid Generative/Discriminative Methods | Gregory Druck, Chris Pal, Xiaojin Zhu, and Andrew McCallum
  • Kr875 | Making Generative Classifiers Robust to Selection Bias | Andrew Smith and Charles Elkan

    R21: Statistical Methods (III)

    Wednesday 9:00 am ~ 10:40 am, Crystal

  • Kr540 | Privacy-Preservation for Gradient Descent Methods | Li Wan, Wee Keong Ng, Shuguo Han, and Vincent Lee
  • Kr423 | Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database | Zhen Guo, Zhongfei Zhang, Eric Xing, and Christos Faloutsos
  • Kr841 | Very Sparse Stable Random Projections for Dimension Reduction in the L-alpha Norm (where 0 < alpha <=2) | Ping Li
  • Kr854 | Discovering the Hidden Structure of House Prices with a Non-Parametric Latent Manifold Model | Sumit Chopra, Trivikraman Thampy, John Leahy, Andrew Caplin, and Yann LeCun


    I1: Data Mining Techniques

    Monday 10:30 am ~ 12:30 pm, Regent Club

  • Invited industrial presentation 1 (Bharat Rao)
  • Extracting Relevant Named Entities for Automated Expense Reimbursement (Guangyu Zhu, Timothy Bethea, and Vikas Krishna)
  • Cleaning Disguised Missing Data: A Heuristic Approach (Ming Hua, Jian Pei)
  • Distributed Classification in Peer-to-Peer Networks (Ping Luo, Hui Xiong)

    I2: Data mining on the web

    Monday 2:00 pm - 3:20 pm, Regent Club

  • Corroborate and Learn Facts from the Web (Shubin Zhao, Jonathan Betz)
  • iLink: Search and Routing in Social Networks (Jeffrey Davitz, Jiye Yu, Sugato Basu, David Gutelius, Alexandra Harris)
  • Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Ron Kohavi, Randal M Henne, Dan Sommerfield)

    I3: User behavior mining

    Monday 3:50 pm - 5:10 pm, Regent Club

  • Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection (Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry Goldberg, John Komoroske)
  • An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs (Sitaram Asur, Srinivasan Parthasarathy, Duygu Ucar)
  • High Quantile Modeling for Customer Wallet Estimation with Other Applications (Claudia Perlich, Saharon Rosset, Richard Lawrence, and Bianca Zadrozny)

    I4: Data mining applications

    Tuesday 10:30 am - 11:50 pm, Regent Club

  • Invited industrial presentation 2 (Joshua Goodman)
  • Mining complex power networks for blackout prevention (JunHua Zhao, ZhaoYang Dong, Pei Zhang)
  • On-board Analysis of Uncalibrated Data for a Spacecraft at Mars (Rebecca Castano, Kiri Wagstaff, Steve Chien, Timothy Stough, Benyang Tang)

    I5: Short presentations

    Tuesday 2:00 pm - 3:20 pm, Regent Club

  • Domain-Constrained Semi-Supervised Mining of Tracking Models in Sensor Networks (Rong Pan, Junhui Zhao, Wenchen Zheng, Jeffrey Junfeng Pan, Dou Shen, Jialin Pan, Qiang Yang)
  • Framework for Classification and Segmentation of Massive Audio Data Streams (Charu Aggarwal)
  • LungCAD: A Clinically Approved, Machine Learning System for Lung Cancer Detection (R Bharat Rao, Jinbo Bi, Glenn Fung, Marcos Salganicoff, Nancy Obuchowski, David Naidich)
  • Truth Discovery with Multiple Conflicting Information Providers on the Web (Xiaoxin Yin, Jiawei Han, and Philip S. Yu)
  • Detecting Changes in Large Data Sets of Payments Cards Data: A Case Study (Robert Grossman, Joseph Bugajski, Chris Curry, David Locke, and Steve Vejcik)
  • Event Summarization for System Management (Wei Peng, Charles Perng, Tao Li, and Haixun Wang)
  • Machine Learning for Stock Selection (Robert Yan and Charles X. Ling)
  • IMDS: Intelligent Malware Detection System (Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye)


    At KDD-07 all accepted papers can be accompanied by a poster presentation. We heavily encourage making use of this opportunity to give the attendees an additional chance to discuss your work with you.

    In order to better plan the poster sessions it is mandatory that you register your poster by sending a short note to Michael Berthold at berthold@ieee.org including the paper ID, the title of your poster and the author presenting it during the poster session. We will only reserve space for registered posters!

    Posters will have to fit within a 3' x 4' (roughly 90cm x 120cm) area.


  • Ramasamy Uthurusamy (General Motors, USA), Chair
  • Jerome Friedman (Stanford University, USA)
  • Jiawei Han (University of Illinois Urbana-Champaign, USA)
  • Vipin Kumar (University of Minnesota, USA)
  • Heikki Mannila (University of Helsinki, Finland)
  • Rajeev Motwani (Stanford University, USA)
  • Ramakrishnan Srikant (Google, USA)
  • Ian H. Witten and Eibe Frank (University of Waikato, New Zealand)
  • Xindong Wu (University of Vermont, USA)


    To access past submission information and call for proposals, please click here.

  • Links