KDD 2007 Conference - Program Information

Program Information

General Schedule
Invited Talks
Research Track Sessions
Industrial & Government Track Sessions
News about Poster Papers
Awards Committee

General Schedule

Saturday, August 11th
5:00 pm - 9:00 pm	Registration
Sunday, August 12th
7:30 am - 8:00 pm	Registration (all day)
9:00 am - 5:00 pm	Full Day Workshops (KDD Cup Workshop will start at 8:45 am)
9:00 am - 12:00 pm	Morning Half Day Workshops
9:00 am - 12:00 pm	Morning Tutorials
10:00 am - 10:30 am	Coffee Break
12:00 pm - 2:00 pm	Lunch (on your own)
2:00 pm - 5:00 pm	Afternoon Half Day Workshops
2:00 pm - 5:00 pm	Afternoon Tutorials
3:00 pm - 3:30 pm	Coffee Break
6:00 pm - 6:15 pm	Opening Remarks by Pavel Berkhin, Rich Caruana and Xindong Wu
6:15 pm - 6:45 pm	ACM SIGKDD Award Presentations:
	* KDD-07 Best Paper Awards, Thorsten Joachims
	* Student Travel Awards, Kamal Nigam
	* KDD Cup Winners, Bing Liu
	* SIGKDD Service and Innovation Awards, Ramasamy Uthurusamy
6:45 pm - 7:30 pm	ACM SIGKDD Innovation Award Talk by Usama Fayyad
Monday, August 13th
7:30 am - 8:00 pm	Registration (all day)
7:30 am - 5:30 pm	Exhibits
7:30 am - 9:00 am	Continental Breakfast
9:00 am - 10:00 am	Invited Talk #1 - Jon Kleinberg, Cornell University
10:00 am - 10:30 am	Coffee Break 30 min break
10:30 am - 12:30 pm	Research Sessions R1, R2, R3 6 papers per session (120 mins)
10:30 am - 12:30 pm	Industry Session I1
12:30 pm - 2:00 pm	Lunch 90 min lunch
2:00 pm - 3:20 pm	Research Sessions R4, R5, R6 4 papers per session (80 mins)
2:00 pm - 3:20 pm	Industry Session I2
3:20 pm - 3:50 pm	Break 30 min break
3:50 pm - 5:10 pm	Research Sessions R7, R8, R9 4 papers per session (80 mins)
3:50 pm - 5:10 pm	Industry Session I3 main sessions end 5:10pm
6:15 pm - 9:15 pm	Poster Reception and Demonstration Session at the San Jose Museum of Art
Tuesday, August 14th
7:30 am - 5:00 pm	Registration (all day)
7:30 am - 8:00 pm	Exhibits
7:30 am - 9:00 am	Continental Breakfast
9:00 am - 10:00 am	Invited talk #2 - Usama Fayyad, Yahoo!
10:00 am - 10:30 am	Coffee Break 30 min break
10:30 am - 11:50 am	Research Sessions R10, R11, R12 4 papers per session (80 mins)
10:30 am - 11:50 am	Industry Session I4
12:00 pm - 2:00 pm	SIGKDD Business Lunch 120 min lunch
2:00 pm - 3:20 pm	Research Sessions R13, R14, R15 4 papers per session (80 mins)
2:00 pm - 3:20 pm	Industry Session I5
3:20 pm - 3:50 pm	Coffee Break 30 min break
3:50 pm - 5:10 pm	Research Sessions R16, R17, R18 4 papers per session (80 mins)
3:50 pm - 5:10 pm	Birds of Feather (BOF) main sessions end 5:10pm
5:15 pm - 6:15 pm	KDD Transfer Meeting (KDD-07 and KDD-08 organizers)
5:40 pm - 8:00 pm	Second Poster Reception, Fairmont Hotel
8:00 pm - 10:00 pm	Program Committee and Organizing Committee Dinner (by invitation only)
Wednesday, August 15th
7:30 am - 9:00 am	Continental Breakfast
9:00 am - 10:40 am	Research Sessions R19, R20, R21 5 papers per session (100 mins)
9:00 am - 10:40 am	Panel Discussion
10:40 am - 11:15 am	Coffee Break 35 min break
11:15 am - 12:15 pm	Invited talk #3 - Chris Anderson, Wired Magazine
12:15 pm - 12:30 pm	Closing Remarks by Pavel Berkhin

Invited Talks

Jon Kleinberg, Cornell University
Monday 9:00 am ~ 10:00 am, Imperial

Challenges in Social Network Data: Processes, Privacy and Paradoxes

The proliferation of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences.

We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication, community formation, information-seeking and collective problem-solving, marketing, the spread of news, and the dynamics of popularity. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages, the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations.

The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization � the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed.

In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information, these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining can inform how we think about the protection of even the most skeletal social network data.

Usama Fayyad, Yahoo!
Tuesday 9:00 am ~ 10:00 am, Imperial

From Mining the Web to Inventing the New Sciences Underlying the Internet

As the Internet continues to change the way we live, find information, communicate, and do business, it has also been taking on a dramatically increasing role in marketing and advertising. Unlike any prior mass medium, the Internet is a unique medium when it comes to interactivity and offers ability to target and program messaging at the individual level. Coupled with its uniqueness in the richness of the data that is available for measurability, in the variety of ways to utilize the data, and in the great dependence of effective marketing on applications that are heavily data-driven, makes data mining and statistical data analysis, modeling, and reporting an essential mission-critical part of running the on-line business.

However, because of its novelty and the scale of data sets involved, few companies have figured out how to properly make use of this data. In this talk, I will review some of the challenges and opportunities in the utilization of data to drive this new generation of marketing systems. I will provide several examples of how data is utilized in critical ways to drive some of these capabilities. The discussion will be framed with the More general framework of Grand Challenges for data mining: pragmatic and technical.

I will conclude this presentation with a consideration of the larger issues surrounding the Internet as a technology that is ubiquitous in our lives, yet one where very little is understood, at the scientific level, in defining and understanding many of the basics the Internet enables: Community, Personalization, and the new Microeconomics of the web. This leads to an overview of the new Yahoo! Research organization and its aims: inventing the new sciences underlying what we do on the Internet, focusing on areas that have received little attention in the traditional academic circles. Some illustrative examples will be reviewed to make the ultimate goals more concrete.

Chris Anderson, Wired Magazine
Wednesday 11:15 am ~ 12:15 pm, Imperial

Calculating Latent Demand in the Long Tail

He is the author of New York Times bestselling book The Long Tail: Why the Future of Business is Selling Less of More, which as published in 2006, and runs a blog on the subject at longtail.com. In 2007 he was named one of the �Time 100,� the newsmagazine�s list of the 100 men and women whose power, talent or moral example is transforming the world.

Previously, he was at The Economist, where he served as U.S. Business Editor, Asia Business Editor (based in Hong Kong); and Technology Editor. He started The Economist�s Internet coverage in 1994 and directed its initial web strategy. Mr. Anderson's media career began at the two premier science journals, Nature and Science, where he served in several editorial capacities. Prior to that he worked as a researcher at Los Alamos National Laboratory�s meson physics facility and served as research assistant to the Chief Scientist of the Department of Transportation. He holds a Bachelor of Science degree in Physics from George Washington University and studied Quantum Mechanics and Science Journalism at the University of California at Berkeley.

Research Track Sessions

R1: Web/Text Mining (I)

Monday 10:30 am ~ 12:30 pm, Regency 1

Kr676 | Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases | Benyah Shaparenko and Thorsten Joachims

Kr497 | Upping the Baseline for High-Precision Text Classifiers | Aleksander Kolcz and Wen-Tau Yih

Kr567 | Extracting Semantic Relations from Query Logs | Ricardo Baeza-Yates and Alessandro Tiberi

Kr722 | Multiscale Topic Tomography | Ramesh Nallapati, William W. Cohen, Susan Ditmore, John Lafferty, and Kin Ung

Kr734 | A Concept-based Model for Enhancing Text Categorization | Shady Shehata, Fakhri Karray, and Mohamed Kamel

Kr806 | Expertise modeling for matching papers with reviewers | David Mimno and Andrew McCallum

R2: Graph Mining and Social Networks

Monday 10:30 am ~ 12:30 pm, Regency 2

Kr710 | Fast Direction-Aware Proximity for Graph Mining | Hanghang Tong, Yehuda Koren, and Christos Faloutsos

Kr346 | Correlation Search in Graph Databases | Yiping Ke, James Cheng, and Wilfred Ng

Kr431 | SCAN: A Structural Clustering Algorithm for Networks | Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger

Kr652 | A Framework For Community Identification in Dynamic Social Networks | Chayant Tantipathananandh, Tanya Y. Berger-Wolf, and David Kempe

Kr700 | Fast Best-Effort Pattern Matching in Large Attributed Graphs | Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad

Kr781 | Temporal Causal Modeling with Graphical Granger Methods | Andrew Arnold, Yan Liu, and Naoki Abe

R3: Filtering and Ranking

Monday 10:30 am ~ 12:30 pm, Crystal

Kr288 | Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing | Seung-Taek Park and David Pennock

Kr414 | Use of Ranked Cross Document Evidence Trails for Hypothesis Generation | Rohini Srihari, Li Xu, and Tushar Saxena

Kr791 | A Learning Framework using Green's Function and Kernel Regularization with Application to Recommender System | Chris Ding, Rong Jin, Tao Li, and Horst Simon

Kr679 | Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems | Robert Bell, Yehuda Koren, and Chris Volinsky

Kr827 | Real-time Ranking with Concept Drift Using Expert Advice | Hila Becker and Marta Arias

Kr702 | Active Exploration for Learning Rankings from Clickthrough Data | Filip Radlinski and Thorsten Joachims

R4: Web/Text Mining (II)

Monday 2pm ~ 3:20pm, Regency 1

Kr812 | Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brain Wave Ontologies | Dejing Dou, Gwen Frishkoff, Jiawei Rong, Robert Frank, Allen Malony, and Don Tucker

Kr301 | Exploiting Duality in Summarization with Deterministic Guarantees | Panagiotis Karras, Dimitris Sacharidis, and Nikos Mamoulis

Kr465 | Webpage Understanding: an Integrated Approach | Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Hsiao-Wuen Hon

Kr490 | Show me the money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews | Nikolay Archak, Anindya Ghose, and Panagiotis Ipeirotis

R5: Classification (I)

Monday 2pm ~ 3:20pm, Regency 2

Kr259 | Support Feature Machine for Classification of Abnormal Brain Activity | W. Art Chaovalitwongse, Ya-Ju Fan, and Rajesh Sachdeo

Kr641 | Automatic Labeling of Multinomial Topic Models | Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai

Kr348 | Mining Statistically Important Equivalence Classes | Jinyan Li, Guimei Liu, and Limsoon Wong

Kr504 | Local Decomposition for Rare Class Analysis | Junjie Wu, Hui Xiong, Peng Wu, and Jian Chen

R6: Clustering (I)

Monday 2pm ~ 3:20pm, Crystal

Kr218 | The Minimum Consistent Subset Cover Problem and its Applications in Data Mining | Byron J. Gao, Martin Ester, Jin-Yi Cai, Oliver Schulte, and Hui Xiong

Kr335 | Co-clustering based Classification for Out-of-domain Documents | Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu

Kr448 | Efficient Incremental Clustering with Constraints | Ian Davidson, S.S. Ravi, and Martin Ester

Kr565 | A Probabilistic Framework for Relational Clustering | Bo Long, Zhongfei Zhang, and Philip S. Yu

R7: Web/Text Mining (III)

Monday 3:50 pm ~ 5:10 pm, Regency 1

Kr555 | Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior | Issei Sato and Hiroshi Nakagawa

Kr689 | Tracking Multiple Topics for Finding Interesting Articles | Raymond Pon, Alfonso Cardenas, David Buttler, and Terence Critchlow

Kr693 | Efficient and Effective Explanation of Change in Hierarchical Summaries | Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Flip Korn, Divesh Srivastava, and Neal Young

Kr712 | Content-based Document Routing and Index Partitioning for Scalable Similarity-based Searches in a Large Corpus | Deepavali Bhagwat, Kave Eshghi, and Pankaj Mehra

R8: Pattern Discovery (I)

Monday 3:50 pm ~ 5:10 pm, Regency 2

Kr276 | Trajectory Pattern Mining | Fosca Giannotti, Mirco Nanni, Dino Pedreschi, and Fabio Pinelli

Kr374 | Finding low-entropy sets and trees from binary data | Hannes Heikinheimo, Eino Hinkkanen, Heikki Mannila, Taneli Mielikinen, and Jouni Seppnen

Kr322 | Detecting Motifs Under Uniform Scaling | Dragomir Yankov, Eamonn Keogh, Jose Medina, Bill Chiu, and Victor Zordan

Kr502 | Mining Favorable Facets | Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, and Ke Wang

R9: Clustering (II)

Monday 3:50 pm ~ 5:10 pm, Crystal

Kr777 | Evolutionary Spectral Clustering by Incorporating Temporal Smoothness | Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle Tseng

Kr205 | Using Hierarchical Clustering for Learning | Vincent Schickel and Boi Faltings

Kr405 | Nestedness and segmented nestedness | Heikki Mannila and Evimaria Terzi

Kr412 | XProj: A Framework for Projected Structural Clustering of XML Documents | Charu Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, and Mohammed Zaki

R10: Web/Text Mining (IV)

Tuesday 10:30 am ~ 11:50 am, Regency 1

Kr751 | Detecting research topics via the correlation between graphs and texts | Yookyung Jo, Carl Lagoze, and C. Lee Giles

Kr756 | Generalized Component Analysis for Text with Heterogeneous Attributes | Xuerui Wang, Chris Pal, and Andrew McCallum

Kr792 | Feature Selection Methods for Text Classification | Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, and Michael Mahoney

Kr452 | Cross-language information retrieval using PARAFAC2 | Peter Chew, Brett Bader, Tamara Kolda, and Ahmed Abdelali

R11: Pattern Discovery (II)

Tuesday 10:30 am ~ 11:50 am, Regency 2

Kr556 | Efficient Mining of Iterative Patterns for Software Specification Discovery | David Lo, Siau-Cheng Khoo, and Chao Liu

Kr605 | From frequent itemsets to semantically meaningful visual patterns | Junsong Yuan, Ying Wu, and Ming Yang

Kr673 | Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns | Lisa Friedland and David Jensen

Kr793 | Association Analysis-based Transformations for Protein Interaction Networks: A Function Prediction Case Study | Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, and Vipin Kumar

R12: Anomaly/Template Detection

Tuesday 10:30 am ~ 11:50 am, Crystal

Kr447 | Weighting versus Pruning in Rule Validation for Detecting Network and Host Anomalies | Gaurav Tandon and Philip Chan

Kr449 | Cost-effective Outbreak Detection in Networks | Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance

Kr204 | Joint Optimization of Wrapper Generation and Template Detection | Shuyi Zheng, Ruihua Song, Di Wu, and Ji-Rong Wen

Kr333 | Detecting Anomalous Records in Categorical Datasets | Kaustav Das and Jeff Schneider

R13: Web/Text Mining (V)

Tuesday 2 pm ~ 3:20 pm, Regency 1

Kr291 | Mining Correlated Bursty Topic Patterns from Coordinated Text Streams | Xuanhui Wang, ChengXiang Zhai, Xiao Hu, and Richard Sproat

Kr551 | Mining Templates from Search Result Records of Search Engines | Hongkun Zhao, Weiyi Meng, and Clement Yu

Kr607 | Exploiting Underrepresented Query Aspects for Automatic Query Expansion | Daniel Crabtree, Peter Andreae, and Xiaoying Gao

Kr790 | Canonicalization of Database Records using Adaptive Similarity Measures | Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, and Andrew McCallum

R14: Statistical Methods (I)

Tuesday 2 pm ~ 3:20 pm, Regency 2

Kr352 | Hierarchical Mixture Models: a Probabilistic Analysis | Mark Sandler

Kr398 | Information distance from a question to an answer | Xian Zhang, Yu Hao, Xiaoyan Zhu, and Ming Li

Kr420 | Statistical Change Detection for Multi-Dimensional Data | Xiuyao Song, Mingxi Wu, Chris Jermaine, and Sanjay Ranka

Kr424 | Learning the Kernel Matrix in Discriminant Analysis via Quadratically Constrained Quadratic Programming | Jieping Ye, Shuiwang Ji, and Jianhui Chen

R15: Clustering (III)

Tuesday 2 pm ~ 3:20 pm, Crystal

Kr434 | Constraint-Driven Clustering | Rong Ge, Martin Ester, Wen Jin, and Ian Davidson

Kr507 | A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network | Motoki Shiga, Ichigaku Takigawa, and Hiroshi Mamitsuka

Kr520 | Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis | Frizo Janssens, Wolfgang Glnzel, and Bart De Moor

Kr606 | Enhancing Semi-Supervised Clustering: A Feature Projection Perspective | Wei Tang, Hui Xiong, Shi Zhong, and Jie Wu

R16: Mining Data Streams

Tuesday 3:50 pm ~ 5:10 pm, Regency 1

Kr270 | Density-Based Clustering of Real-Time Stream Data | Yixin Chen and Li Tu

Kr430 | On String Classification in Data Streams | Charu Aggarwal and Philip S. Yu

Kr305 | A fast algorithm for finding frequent episodes in event streams | Srivatsan Laxman, Sastry P. S., and Unnikrishnan K. P.

Kr660 | Practical Learning from One Sided Feedback | D. Sculley

R17: Statistical Methods (II)

Tuesday 3:50 pm ~ 5:10 pm, Regency 2

Kr437 | Scalable Look-Ahead Linear Regression Trees | David Vogel, Ognian Asparouhov, and Tobias Scheffer

Kr475 | Estimating Rates of Rare Events at Multiple Resolutions | Deepak Agarwal, Andrei Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovski, and Mayssam Sayyadian

Kr717 | Predictive Discrete Latent Factor Models for Large Scale Dyadic Data | Deepak Agarwal and Srujana Merugu

Kr491 | A Scalable Modular Convex Solver for Regularized Risk Minimization | Quoc Le, Alex Smola, Choon Hui Teo, and Vishwanathan S V N

R18: Clustering (IV)

Tuesday 3:50 pm ~ 5:10 pm, Crystal

Kr664 | BoostCluster: Boosting Clustering by Pairwise Constraints | Yi Liu, Rong Jin, Anil Jain, and Pavan Mallapragada

Kr683 | Nonlinear Adaptive Distance Metric Learning for Clustering | Jianhui Chen, Zheng Zhao, Jieping Ye, and Huan Liu

Kr704 | A Framework for Simultaneous Co-clustering and Learning from Complex Data | Meghana Deodhar and Joydeep Ghosh

Kr773 | Joint Cluster Analysis of Attribute and Relationship Data Without Priori Specification of the Number of Clusters | Flavia Moser, Rong Ge, and Martin Ester

R19: Temporal Data Mining

Wednesday 9:00 am ~ 10:40 am, Regency 1

Kr560 | Stochastic Processes and Temporal Data Mining | Paul Cotofrei and Kilian Stoffel

Kr570 | Characterising the Difference | Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes

Kr600 | Structural and Temporal Analysis of the Blogosphere Through Community Factorization | Yun Chi, Shenghuo Zhu, Xiaodan Song, Junichi Tatemura, and Belle Tseng

Kr627 | Time-Dependent Event Hierarchy Construction | Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, and Philip S. Yu

Kr687 | GraphScope: Parameter-free Mining of Large Time-evolving Graphs | Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos

R20: Classification (II)

Wednesday 9:00 am ~ 10:40 am, Regency 2

Kr546 | Mining Optimal Decision Trees from Itemset Lattices | Siegfried Nijssen and Elisa Fromont

Kr614 | Partial Example Acquisition in Cost-Sensitive Learning | Victor S. Sheng and Charles X. Ling

Kr778 | Model-Shared Subspace Boosting for Multi-label Classification | Rong Yan, Jelena Tesic, and John Smith

Kr848 | Semi-Supervised Classification with Hybrid Generative/Discriminative Methods | Gregory Druck, Chris Pal, Xiaojin Zhu, and Andrew McCallum

Kr875 | Making Generative Classifiers Robust to Selection Bias | Andrew Smith and Charles Elkan

R21: Statistical Methods (III)

Wednesday 9:00 am ~ 10:40 am, Crystal

Kr540 | Privacy-Preservation for Gradient Descent Methods | Li Wan, Wee Keong Ng, Shuguo Han, and Vincent Lee

Kr423 | Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database | Zhen Guo, Zhongfei Zhang, Eric Xing, and Christos Faloutsos

Kr841 | Very Sparse Stable Random Projections for Dimension Reduction in the L-alpha Norm (where 0 < alpha <=2) | Ping Li

Kr854 | Discovering the Hidden Structure of House Prices with a Non-Parametric Latent Manifold Model | Sumit Chopra, Trivikraman Thampy, John Leahy, Andrew Caplin, and Yann LeCun

Industrial & Government Track Sessions

I1: Data Mining Techniques

Monday 10:30 am ~ 12:30 pm, Regent Club

Invited industrial presentation 1 (Bharat Rao)

Extracting Relevant Named Entities for Automated Expense Reimbursement (Guangyu Zhu, Timothy Bethea, and Vikas Krishna)

Cleaning Disguised Missing Data: A Heuristic Approach (Ming Hua, Jian Pei)

Distributed Classification in Peer-to-Peer Networks (Ping Luo, Hui Xiong)

I2: Data mining on the web

Monday 2:00 pm - 3:20 pm, Regent Club

Corroborate and Learn Facts from the Web (Shubin Zhao, Jonathan Betz)

iLink: Search and Routing in Social Networks (Jeffrey Davitz, Jiye Yu, Sugato Basu, David Gutelius, Alexandra Harris)

Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Ron Kohavi, Randal M Henne, Dan Sommerfield)

I3: User behavior mining

Monday 3:50 pm - 5:10 pm, Regent Club

Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection (Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry Goldberg, John Komoroske)

An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs (Sitaram Asur, Srinivasan Parthasarathy, Duygu Ucar)

High Quantile Modeling for Customer Wallet Estimation with Other Applications (Claudia Perlich, Saharon Rosset, Richard Lawrence, and Bianca Zadrozny)

I4: Data mining applications

Tuesday 10:30 am - 11:50 pm, Regent Club

Invited industrial presentation 2 (Joshua Goodman)

Mining complex power networks for blackout prevention (JunHua Zhao, ZhaoYang Dong, Pei Zhang)

On-board Analysis of Uncalibrated Data for a Spacecraft at Mars (Rebecca Castano, Kiri Wagstaff, Steve Chien, Timothy Stough, Benyang Tang)

I5: Short presentations

Tuesday 2:00 pm - 3:20 pm, Regent Club

Domain-Constrained Semi-Supervised Mining of Tracking Models in Sensor Networks (Rong Pan, Junhui Zhao, Wenchen Zheng, Jeffrey Junfeng Pan, Dou Shen, Jialin Pan, Qiang Yang)

Framework for Classification and Segmentation of Massive Audio Data Streams (Charu Aggarwal)

LungCAD: A Clinically Approved, Machine Learning System for Lung Cancer Detection (R Bharat Rao, Jinbo Bi, Glenn Fung, Marcos Salganicoff, Nancy Obuchowski, David Naidich)

Truth Discovery with Multiple Conflicting Information Providers on the Web (Xiaoxin Yin, Jiawei Han, and Philip S. Yu)

Detecting Changes in Large Data Sets of Payments Cards Data: A Case Study (Robert Grossman, Joseph Bugajski, Chris Curry, David Locke, and Steve Vejcik)

Event Summarization for System Management (Wei Peng, Charles Perng, Tao Li, and Haixun Wang)

Machine Learning for Stock Selection (Robert Yan and Charles X. Ling)

IMDS: Intelligent Malware Detection System (Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye)

News about Poster Papers for Authors of Accepted Papers

At KDD-07 all accepted papers can be accompanied by a poster presentation. We heavily encourage making use of this opportunity to give the attendees an additional chance to discuss your work with you.

In order to better plan the poster sessions it is mandatory that you register your poster by sending a short note to Michael Berthold at berthold@ieee.org including the paper ID, the title of your poster and the author presenting it during the poster session. We will only reserve space for registered posters!

Posters will have to fit within a 3' x 4' (roughly 90cm x 120cm) area.

2007 Awards Committee

Ramasamy Uthurusamy (General Motors, USA), Chair

Jerome Friedman (Stanford University, USA)

Jiawei Han (University of Illinois Urbana-Champaign, USA)

Vipin Kumar (University of Minnesota, USA)

Heikki Mannila (University of Helsinki, Finland)

Rajeev Motwani (Stanford University, USA)

Ramakrishnan Srikant (Google, USA)

Ian H. Witten and Eibe Frank (University of Waikato, New Zealand)

Xindong Wu (University of Vermont, USA)

To access past submission information and call for proposals, please click here.