| Saturday, August 11th | |
| 5:00 pm - 9:00 pm | Registration |
| Sunday, August 12th | |
| 7:30 am - 8:00 pm | Registration (all day) |
| 9:00 am - 5:00 pm | Full Day Workshops (KDD Cup Workshop will start at 8:45 am) |
| 9:00 am - 12:00 pm | Morning Half Day Workshops |
| 9:00 am - 12:00 pm | Morning Tutorials |
| 10:00 am - 10:30 am | Coffee Break |
| 12:00 pm - 2:00 pm | Lunch (on your own) |
| 2:00 pm - 5:00 pm | Afternoon Half Day Workshops |
| 2:00 pm - 5:00 pm | Afternoon Tutorials |
| 3:00 pm - 3:30 pm | Coffee Break |
| 6:00 pm - 6:15 pm | Opening Remarks by Pavel Berkhin, Rich Caruana and Xindong Wu |
| 6:15 pm - 6:45 pm | ACM SIGKDD Award Presentations: |
| * KDD-07 Best Paper Awards, Thorsten Joachims |
| * Student Travel Awards, Kamal Nigam |
| * KDD Cup Winners, Bing Liu |
| * SIGKDD Service and Innovation Awards, Ramasamy Uthurusamy |
| 6:45 pm - 7:30 pm | ACM SIGKDD Innovation Award Talk by Usama Fayyad |
| Monday, August 13th | |
| 7:30 am - 8:00 pm | Registration (all day) |
| 7:30 am - 5:30 pm | Exhibits |
| 7:30 am - 9:00 am | Continental Breakfast |
| 9:00 am - 10:00 am | Invited Talk #1 - Jon Kleinberg, Cornell University |
| 10:00 am - 10:30 am | Coffee Break 30 min break |
| 10:30 am - 12:30 pm | Research Sessions R1, R2, R3 6 papers per session (120 mins) |
| 10:30 am - 12:30 pm | Industry Session I1 |
| 12:30 pm - 2:00 pm | Lunch 90 min lunch |
| 2:00 pm - 3:20 pm | Research Sessions R4, R5, R6 4 papers per session (80 mins) |
| 2:00 pm - 3:20 pm | Industry Session I2 |
| 3:20 pm - 3:50 pm | Break 30 min break |
| 3:50 pm - 5:10 pm | Research Sessions R7, R8, R9 4 papers per session (80 mins) |
| 3:50 pm - 5:10 pm | Industry Session I3 main sessions end 5:10pm |
| 6:15 pm - 9:15 pm | Poster Reception and Demonstration Session at the San Jose Museum of Art |
| Tuesday, August 14th | |
| 7:30 am - 5:00 pm | Registration (all day) |
| 7:30 am - 8:00 pm | Exhibits |
| 7:30 am - 9:00 am | Continental Breakfast |
| 9:00 am - 10:00 am | Invited talk #2 - Usama Fayyad, Yahoo! |
| 10:00 am - 10:30 am | Coffee Break 30 min break |
| 10:30 am - 11:50 am | Research Sessions R10, R11, R12 4 papers per session (80 mins) |
| 10:30 am - 11:50 am | Industry Session I4 |
| 12:00 pm - 2:00 pm | SIGKDD Business Lunch 120 min lunch |
| 2:00 pm - 3:20 pm | Research Sessions R13, R14, R15 4 papers per session (80 mins) |
| 2:00 pm - 3:20 pm | Industry Session I5 |
| 3:20 pm - 3:50 pm | Coffee Break 30 min break |
| 3:50 pm - 5:10 pm | Research Sessions R16, R17, R18 4 papers per session (80 mins) |
| 3:50 pm - 5:10 pm | Birds of Feather (BOF) main sessions end 5:10pm |
| 5:15 pm - 6:15 pm | KDD Transfer Meeting (KDD-07 and KDD-08 organizers) |
| 5:40 pm - 8:00 pm | Second Poster Reception, Fairmont Hotel |
| 8:00 pm - 10:00 pm | Program Committee and Organizing Committee Dinner
(by invitation only) |
| Wednesday, August 15th | |
| 7:30 am - 9:00 am | Continental Breakfast |
| 9:00 am - 10:40 am | Research Sessions R19, R20, R21 5 papers per session (100 mins) |
| 9:00 am - 10:40 am | Panel Discussion |
| 10:40 am - 11:15 am | Coffee Break 35 min break |
| 11:15 am - 12:15 pm | Invited talk #3 - Chris Anderson, Wired Magazine |
| 12:15 pm - 12:30 pm | Closing Remarks by Pavel Berkhin |
Jon Kleinberg, Cornell University
Monday 9:00 am ~ 10:00 am, Imperial
Challenges in Social Network Data: Processes, Privacy and Paradoxes
The proliferation of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences.
We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication, community formation, information-seeking and collective problem-solving, marketing, the spread of news, and the dynamics of popularity. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages, the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations.
The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization � the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed.
In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information, these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining can inform how we think about the protection of even the most skeletal social network data.
Usama Fayyad, Yahoo!
Tuesday 9:00 am ~ 10:00 am, Imperial
From Mining the Web to Inventing the New Sciences Underlying the Internet
As the Internet continues to change the way we live, find information, communicate, and do business, it has also been taking on a dramatically increasing role in marketing and advertising. Unlike any prior mass medium, the Internet is a unique medium when it comes to interactivity and offers ability to target and program messaging at the individual level. Coupled with its uniqueness in the richness of the data that is available for measurability, in the variety of ways to utilize the data, and in the great dependence of effective marketing on applications that are heavily data-driven, makes data mining and statistical data analysis, modeling, and reporting an essential mission-critical part of running the on-line business.
However, because of its novelty and the scale of data sets involved, few companies have figured out how to properly make use of this data. In this talk, I will review some of the challenges and opportunities in the utilization of data to drive this new generation of marketing systems. I will provide several examples of how data is utilized in critical ways to drive some of these capabilities. The discussion will be framed with the More general framework of Grand Challenges for data mining: pragmatic and technical.
I will conclude this presentation with a consideration of the larger issues surrounding the Internet as a technology that is ubiquitous in our lives, yet one where very little is understood, at the scientific level, in defining and understanding many of the basics the Internet enables: Community, Personalization, and the new Microeconomics of the web. This leads to an overview of the new Yahoo! Research organization and its aims: inventing the new sciences underlying what we do on the Internet, focusing on areas that have received little attention in the traditional academic circles. Some illustrative examples will be reviewed to make the ultimate goals more concrete.
Chris Anderson, Wired Magazine
Wednesday 11:15 am ~ 12:15 pm, Imperial
Calculating Latent Demand in the Long Tail
He is the author of New York Times bestselling book The Long Tail: Why the Future of Business is Selling Less of More, which as published in 2006, and runs a blog on the subject at longtail.com. In 2007 he was named one of the �Time 100,� the newsmagazine�s list of the 100 men and women whose power, talent or moral example is transforming the world.
Previously, he was at The Economist, where he served as U.S. Business Editor, Asia Business Editor (based in Hong Kong); and Technology Editor. He started The Economist�s Internet coverage in 1994 and directed its initial web strategy. Mr. Anderson's media career began at the two premier science journals, Nature and Science, where he served in several editorial capacities. Prior to that he worked as a researcher at Los Alamos National Laboratory�s meson physics facility and served as research assistant to the Chief Scientist of the Department of Transportation. He holds a Bachelor of Science degree in Physics from George Washington University and studied Quantum Mechanics and Science Journalism at the University of California at Berkeley.
R1: Web/Text Mining (I)
Monday 10:30 am ~ 12:30 pm, Regency 1
Kr676 | Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases | Benyah Shaparenko and Thorsten Joachims
Kr497 | Upping the Baseline for High-Precision Text Classifiers | Aleksander Kolcz and Wen-Tau Yih
Kr567 | Extracting Semantic Relations from Query Logs | Ricardo Baeza-Yates and Alessandro Tiberi
Kr722 | Multiscale Topic Tomography | Ramesh Nallapati, William W. Cohen, Susan Ditmore, John Lafferty, and Kin Ung
Kr734 | A Concept-based Model for Enhancing Text Categorization | Shady Shehata, Fakhri Karray, and Mohamed Kamel
Kr806 | Expertise modeling for matching papers with reviewers | David Mimno and Andrew McCallum
R2: Graph Mining and Social Networks
Monday 10:30 am ~ 12:30 pm, Regency 2
Kr710 | Fast Direction-Aware Proximity for Graph Mining | Hanghang Tong, Yehuda Koren, and Christos Faloutsos
Kr346 | Correlation Search in Graph Databases | Yiping Ke, James Cheng, and Wilfred Ng
Kr431 | SCAN: A Structural Clustering Algorithm for Networks | Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger
Kr652 | A Framework For Community Identification in Dynamic Social Networks | Chayant Tantipathananandh, Tanya Y. Berger-Wolf, and David Kempe
Kr700 | Fast Best-Effort Pattern Matching in Large Attributed Graphs | Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad
Kr781 | Temporal Causal Modeling with Graphical Granger Methods | Andrew Arnold, Yan Liu, and Naoki Abe
R3: Filtering and Ranking
Monday 10:30 am ~ 12:30 pm, Crystal
Kr288 | Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing | Seung-Taek Park and David Pennock
Kr414 | Use of Ranked Cross Document Evidence Trails for Hypothesis Generation | Rohini Srihari, Li Xu, and Tushar Saxena
Kr791 | A Learning Framework using Green's Function and Kernel Regularization with Application to Recommender System | Chris Ding, Rong Jin, Tao Li, and Horst Simon
Kr679 | Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems | Robert Bell, Yehuda Koren, and Chris Volinsky
Kr827 | Real-time Ranking with Concept Drift Using Expert Advice | Hila Becker and Marta Arias
Kr702 | Active Exploration for Learning Rankings from Clickthrough Data | Filip Radlinski and Thorsten Joachims
R4: Web/Text Mining (II)
Monday 2pm ~ 3:20pm, Regency 1
Kr812 | Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brain Wave Ontologies | Dejing Dou, Gwen Frishkoff, Jiawei Rong, Robert Frank, Allen Malony, and Don Tucker
Kr301 | Exploiting Duality in Summarization with Deterministic Guarantees | Panagiotis Karras, Dimitris Sacharidis, and Nikos Mamoulis
Kr465 | Webpage Understanding: an Integrated Approach | Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, and Hsiao-Wuen Hon
Kr490 | Show me the money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews | Nikolay Archak, Anindya Ghose, and Panagiotis Ipeirotis
R5: Classification (I)
Monday 2pm ~ 3:20pm, Regency 2
Kr259 | Support Feature Machine for Classification of Abnormal Brain Activity | W. Art Chaovalitwongse, Ya-Ju Fan, and Rajesh Sachdeo
Kr641 | Automatic Labeling of Multinomial Topic Models | Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai
Kr348 | Mining Statistically Important Equivalence Classes | Jinyan Li, Guimei Liu, and Limsoon Wong
Kr504 | Local Decomposition for Rare Class Analysis | Junjie Wu, Hui Xiong, Peng Wu, and Jian Chen
R6: Clustering (I)
Monday 2pm ~ 3:20pm, Crystal
Kr218 | The Minimum Consistent Subset Cover Problem and its Applications in Data Mining | Byron J. Gao, Martin Ester, Jin-Yi Cai, Oliver Schulte, and Hui Xiong
Kr335 | Co-clustering based Classification for Out-of-domain Documents | Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu
Kr448 | Efficient Incremental Clustering with Constraints | Ian Davidson, S.S. Ravi, and Martin Ester
Kr565 | A Probabilistic Framework for Relational Clustering | Bo Long, Zhongfei Zhang, and Philip S. Yu
R7: Web/Text Mining (III)
Monday 3:50 pm ~ 5:10 pm, Regency 1
Kr555 | Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior | Issei Sato and Hiroshi Nakagawa
Kr689 | Tracking Multiple Topics for Finding Interesting Articles | Raymond Pon, Alfonso Cardenas, David Buttler, and Terence Critchlow
Kr693 | Efficient and Effective Explanation of Change in Hierarchical Summaries | Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Flip Korn, Divesh Srivastava, and Neal Young
Kr712 | Content-based Document Routing and Index Partitioning for Scalable Similarity-based Searches in a Large Corpus | Deepavali Bhagwat, Kave Eshghi, and Pankaj Mehra
R8: Pattern Discovery (I)
Monday 3:50 pm ~ 5:10 pm, Regency 2
Kr276 | Trajectory Pattern Mining | Fosca Giannotti, Mirco Nanni, Dino Pedreschi, and Fabio Pinelli
Kr374 | Finding low-entropy sets and trees from binary data | Hannes Heikinheimo, Eino Hinkkanen, Heikki Mannila, Taneli Mielikinen, and Jouni Seppnen
Kr322 | Detecting Motifs Under Uniform Scaling | Dragomir Yankov, Eamonn Keogh, Jose Medina, Bill Chiu, and Victor Zordan
Kr502 | Mining Favorable Facets | Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, and Ke Wang
R9: Clustering (II)
Monday 3:50 pm ~ 5:10 pm, Crystal
Kr777 | Evolutionary Spectral Clustering by Incorporating Temporal Smoothness | Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle Tseng
Kr205 | Using Hierarchical Clustering for Learning | Vincent Schickel and Boi Faltings
Kr405 | Nestedness and segmented nestedness | Heikki Mannila and Evimaria Terzi
Kr412 | XProj: A Framework for Projected Structural Clustering of XML Documents | Charu Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, and Mohammed Zaki
R10: Web/Text Mining (IV)
Tuesday 10:30 am ~ 11:50 am, Regency 1
Kr751 | Detecting research topics via the correlation between graphs and texts | Yookyung Jo, Carl Lagoze, and C. Lee Giles
Kr756 | Generalized Component Analysis for Text with Heterogeneous Attributes | Xuerui Wang, Chris Pal, and Andrew McCallum
Kr792 | Feature Selection Methods for Text Classification | Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, and Michael Mahoney
Kr452 | Cross-language information retrieval using PARAFAC2 | Peter Chew, Brett Bader, Tamara Kolda, and Ahmed Abdelali
R11: Pattern Discovery (II)
Tuesday 10:30 am ~ 11:50 am, Regency 2
Kr556 | Efficient Mining of Iterative Patterns for Software Specification Discovery | David Lo, Siau-Cheng Khoo, and Chao Liu
Kr605 | From frequent itemsets to semantically meaningful visual patterns | Junsong Yuan, Ying Wu, and Ming Yang
Kr673 | Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns | Lisa Friedland and David Jensen
Kr793 | Association Analysis-based Transformations for Protein Interaction Networks: A Function Prediction Case Study | Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, and Vipin Kumar
R12: Anomaly/Template Detection
Tuesday 10:30 am ~ 11:50 am, Crystal
Kr447 | Weighting versus Pruning in Rule Validation for Detecting Network and Host Anomalies | Gaurav Tandon and Philip Chan
Kr449 | Cost-effective Outbreak Detection in Networks | Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance
Kr204 | Joint Optimization of Wrapper Generation and Template Detection | Shuyi Zheng, Ruihua Song, Di Wu, and Ji-Rong Wen
Kr333 | Detecting Anomalous Records in Categorical Datasets | Kaustav Das and Jeff Schneider
R13: Web/Text Mining (V)
Tuesday 2 pm ~ 3:20 pm, Regency 1
Kr291 | Mining Correlated Bursty Topic Patterns from Coordinated Text Streams | Xuanhui Wang, ChengXiang Zhai, Xiao Hu, and Richard Sproat
Kr551 | Mining Templates from Search Result Records of Search Engines | Hongkun Zhao, Weiyi Meng, and Clement Yu
Kr607 | Exploiting Underrepresented Query Aspects for Automatic Query Expansion | Daniel Crabtree, Peter Andreae, and Xiaoying Gao
Kr790 | Canonicalization of Database Records using Adaptive Similarity Measures | Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, and Andrew McCallum
R14: Statistical Methods (I)
Tuesday 2 pm ~ 3:20 pm, Regency 2
Kr352 | Hierarchical Mixture Models: a Probabilistic Analysis | Mark Sandler
Kr398 | Information distance from a question to an answer | Xian Zhang, Yu Hao, Xiaoyan Zhu, and Ming Li
Kr420 | Statistical Change Detection for Multi-Dimensional Data | Xiuyao Song, Mingxi Wu, Chris Jermaine, and Sanjay Ranka
Kr424 | Learning the Kernel Matrix in Discriminant Analysis via Quadratically Constrained Quadratic Programming | Jieping Ye, Shuiwang Ji, and Jianhui Chen
R15: Clustering (III)
Tuesday 2 pm ~ 3:20 pm, Crystal
Kr434 | Constraint-Driven Clustering | Rong Ge, Martin Ester, Wen Jin, and Ian Davidson
Kr507 | A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network | Motoki Shiga, Ichigaku Takigawa, and Hiroshi Mamitsuka
Kr520 | Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis | Frizo Janssens, Wolfgang Glnzel, and Bart De Moor
Kr606 | Enhancing Semi-Supervised Clustering: A Feature Projection Perspective | Wei Tang, Hui Xiong, Shi Zhong, and Jie Wu
R16: Mining Data Streams
Tuesday 3:50 pm ~ 5:10 pm, Regency 1
Kr270 | Density-Based Clustering of Real-Time Stream Data | Yixin Chen and Li Tu
Kr430 | On String Classification in Data Streams | Charu Aggarwal and Philip S. Yu
Kr305 | A fast algorithm for finding frequent episodes in event streams | Srivatsan Laxman, Sastry P. S., and Unnikrishnan K. P.
Kr660 | Practical Learning from One Sided Feedback | D. Sculley
R17: Statistical Methods (II)
Tuesday 3:50 pm ~ 5:10 pm, Regency 2
Kr437 | Scalable Look-Ahead Linear Regression Trees | David Vogel, Ognian Asparouhov, and Tobias Scheffer
Kr475 | Estimating Rates of Rare Events at Multiple Resolutions | Deepak Agarwal, Andrei Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovski, and Mayssam Sayyadian
Kr717 | Predictive Discrete Latent Factor Models for Large Scale Dyadic Data | Deepak Agarwal and Srujana Merugu
Kr491 | A Scalable Modular Convex Solver for Regularized Risk Minimization | Quoc Le, Alex Smola, Choon Hui Teo, and Vishwanathan S V N
R18: Clustering (IV)
Tuesday 3:50 pm ~ 5:10 pm, Crystal
Kr664 | BoostCluster: Boosting Clustering by Pairwise Constraints | Yi Liu, Rong Jin, Anil Jain, and Pavan Mallapragada
Kr683 | Nonlinear Adaptive Distance Metric Learning for Clustering | Jianhui Chen, Zheng Zhao, Jieping Ye, and Huan Liu
Kr704 | A Framework for Simultaneous Co-clustering and Learning from Complex Data | Meghana Deodhar and Joydeep Ghosh
Kr773 | Joint Cluster Analysis of Attribute and Relationship Data Without Priori Specification of the Number of Clusters | Flavia Moser, Rong Ge, and Martin Ester
R19: Temporal Data Mining
Wednesday 9:00 am ~ 10:40 am, Regency 1
Kr560 | Stochastic Processes and Temporal Data Mining | Paul Cotofrei and Kilian Stoffel
Kr570 | Characterising the Difference | Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes
Kr600 | Structural and Temporal Analysis of the Blogosphere Through Community Factorization | Yun Chi, Shenghuo Zhu, Xiaodan Song, Junichi Tatemura, and Belle Tseng
Kr627 | Time-Dependent Event Hierarchy Construction | Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, and Philip S. Yu
Kr687 | GraphScope: Parameter-free Mining of Large Time-evolving Graphs | Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos
R20: Classification (II)
Wednesday 9:00 am ~ 10:40 am, Regency 2
Kr546 | Mining Optimal Decision Trees from Itemset Lattices | Siegfried Nijssen and Elisa Fromont
Kr614 | Partial Example Acquisition in Cost-Sensitive Learning | Victor S. Sheng and Charles X. Ling
Kr778 | Model-Shared Subspace Boosting for Multi-label Classification | Rong Yan, Jelena Tesic, and John Smith
Kr848 | Semi-Supervised Classification with Hybrid Generative/Discriminative Methods | Gregory Druck, Chris Pal, Xiaojin Zhu, and Andrew McCallum
Kr875 | Making Generative Classifiers Robust to Selection Bias | Andrew Smith and Charles Elkan
R21: Statistical Methods (III)
Wednesday 9:00 am ~ 10:40 am, Crystal
Kr540 | Privacy-Preservation for Gradient Descent Methods | Li Wan, Wee Keong Ng, Shuguo Han, and Vincent Lee
Kr423 | Enhanced Max Margin Learning on Multimodal Data Mining in a Multimedia Database | Zhen Guo, Zhongfei Zhang, Eric Xing, and Christos Faloutsos
Kr841 | Very Sparse Stable Random Projections for Dimension Reduction in the
L-alpha Norm (where 0 < alpha <=2) | Ping Li
Kr854 | Discovering the Hidden Structure of House Prices with a Non-Parametric Latent Manifold Model | Sumit Chopra, Trivikraman Thampy, John Leahy, Andrew Caplin, and Yann LeCun
I1: Data Mining Techniques
Monday 10:30 am ~ 12:30 pm, Regent Club
Invited industrial presentation 1 (Bharat Rao)
Extracting Relevant Named Entities for Automated Expense Reimbursement (Guangyu Zhu, Timothy Bethea, and Vikas Krishna)
Cleaning Disguised Missing Data: A Heuristic Approach (Ming Hua, Jian Pei)
Distributed Classification in Peer-to-Peer Networks (Ping Luo, Hui Xiong)
I2: Data mining on the web
Monday 2:00 pm - 3:20 pm, Regent Club
Corroborate and Learn Facts from the Web (Shubin Zhao, Jonathan Betz)
iLink: Search and Routing in Social Networks (Jeffrey Davitz, Jiye Yu, Sugato Basu, David Gutelius, Alexandra Harris)
Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Ron Kohavi, Randal M Henne, Dan Sommerfield)
I3: User behavior mining
Monday 3:50 pm - 5:10 pm, Regent Club
Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection (Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry Goldberg, John Komoroske)
An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs (Sitaram Asur, Srinivasan Parthasarathy, Duygu Ucar)
High Quantile Modeling for Customer Wallet Estimation with Other Applications (Claudia Perlich, Saharon Rosset, Richard Lawrence, and Bianca Zadrozny)
I4: Data mining applications
Tuesday 10:30 am - 11:50 pm, Regent Club
Invited industrial presentation 2 (Joshua Goodman)
Mining complex power networks for blackout prevention (JunHua Zhao, ZhaoYang Dong, Pei Zhang)
On-board Analysis of Uncalibrated Data for a Spacecraft at Mars (Rebecca Castano, Kiri Wagstaff, Steve Chien, Timothy Stough, Benyang Tang)
I5: Short presentations
Tuesday 2:00 pm - 3:20 pm, Regent Club
Domain-Constrained Semi-Supervised Mining of Tracking Models in Sensor Networks (Rong Pan, Junhui Zhao, Wenchen Zheng, Jeffrey Junfeng Pan, Dou Shen, Jialin Pan, Qiang Yang)
Framework for Classification and Segmentation of Massive Audio Data Streams (Charu Aggarwal)
LungCAD: A Clinically Approved, Machine Learning System for Lung Cancer Detection (R Bharat Rao, Jinbo Bi, Glenn Fung, Marcos Salganicoff, Nancy Obuchowski, David Naidich)
Truth Discovery with Multiple Conflicting Information Providers on the Web (Xiaoxin Yin, Jiawei Han, and Philip S. Yu)
Detecting Changes in Large Data Sets of Payments Cards Data: A Case Study (Robert Grossman, Joseph Bugajski, Chris Curry, David Locke, and Steve Vejcik)
Event Summarization for System Management (Wei Peng, Charles Perng, Tao Li, and Haixun Wang)
Machine Learning for Stock Selection (Robert Yan and Charles X. Ling)
IMDS: Intelligent Malware Detection System (Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye)
At KDD-07 all accepted papers can be accompanied by
a poster presentation. We heavily encourage making use of this
opportunity to give the attendees an additional chance to discuss
your work with you.
In order to better plan the poster sessions it is mandatory
that you register your poster by sending a short note to Michael Berthold
at berthold@ieee.org including the paper ID, the title of your poster
and the author presenting it during the poster session. We will only
reserve space for registered posters!
Posters will have to fit within a 3' x 4' (roughly 90cm x 120cm)
area.
Ramasamy Uthurusamy (General Motors, USA), Chair
Jerome Friedman (Stanford University, USA)
Jiawei Han (University of Illinois Urbana-Champaign, USA)
Vipin Kumar (University of Minnesota, USA)
Heikki Mannila (University of Helsinki, Finland)
Rajeev Motwani (Stanford University, USA)
Ramakrishnan Srikant (Google, USA)
Ian H. Witten and Eibe Frank (University of Waikato, New Zealand)
Xindong Wu (University of Vermont, USA)
To access past submission information and call for proposals, please click here.
|