KDD '15- Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years

MOOCS: What Have We Learned?

Machine Learning and Causal Inference for Policy Evaluation

Data, Knowledge and Discovery: Machine Learning meets Natural Science

SESSION: Research Paper Presentations (Part 1)

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

TimeMachine: Timeline Generation for Knowledge-Base Entities

Estimating Local Intrinsic Dimensionality

Portraying Collective Spatial Attention in Twitter

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Efficient Online Evaluation of Big Data Stream Classifiers

Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach

Facets: Fast Comprehensive Mining of Coevolving High-order Time Series

Online Outlier Exploration Over Large Datasets

BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification

On the Formation of Circles in Co-authorship Networks

Heterogeneous Network Embedding via Deep Architectures

Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

Efficient Algorithms for Public-Private Social Networks

Warm Start for Parameter Selection of Linear Classifiers

Stream Sampling for Frequency Cap Statistics

Adaptation Algorithm and Theory Based on Generalized Discrepancy

Optimal Action Extraction for Random Forests and Boosted Trees

Dynamic Matrix Factorization with Priors on Unknown Values

CoupledLP: Link Prediction in Coupled Networks

Unsupervised Feature Selection with Adaptive Structure Learning

Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data

More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data

Certifying and Removing Disparate Impact

RSC: Mining and Modeling Temporal Activity in Social Media

A Clustering-Based Framework to Control Block Sizes for Entity Resolution

Who Supported Obama in 2012?: Ecological Inference through Distribution Regression

Real Estate Ranking via Mixed Land-use Latent Models

Adaptive Message Update for Fast Affinity Propagation

Monitoring Least Squares Models of Distributed Streams

Reconstructing Textual Documents from n-grams

Anatomical Annotations for Drosophila Gene Expression Patterns via Multi-Dimensional Visual Descriptors Integration: Multi-Dimensional Feature Learning

Selective Hashing: Closing the Gap between Radius Search and k-NN Search

Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms

Instance Weighting for Patient-Specific Risk Stratification Models

A Deep Hybrid Model for Weather Forecasting

Network Lasso: Clustering and Optimization in Large Graphs

Learning Tree Structure in Multi-Task Learning

Probabilistic Community and Role Model for Social Networks

Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering

Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming

Inferring Air Quality for Station Location Recommendation Based on Urban Big Data

Website Optimization Problem and Its Solutions

Reciprocity in Social Networks with Capacity Constraints

Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings

Structured Hedging for Resource Allocations with Leverage

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Accelerated Alternating Direction Method of Multipliers

Deep Computational Phenotyping

Leveraging Social Context for Modeling Topic Evolution

Scalable Blocking for Privacy Preserving Record Linkage

Real Time Recommendations from Connoisseurs

Towards Decision Support and Goal Achievement: Identifying Action-Outcome Relationships From Social Media

On Estimating the Swapping Rate for Categorical Data

Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization

A Decision Tree Framework for Spatiotemporal Sequence Prediction

TOPTRAC: Topical Trajectory Pattern Mining

From Group to Individual Labels Using Deep Features

VEWS: A Wikipedia Vandal Early Warning System

Unified and Contrasting Cuts in Multiple Graphs: Application to Medical Imaging Segmentation

Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning

Maximum Likelihood Postprocessing for Differential Privacy under Consistency Constraints

Online Influence Maximization

The Child is Father of the Man: Foresee the Success at the Early Stage

0-Bit Consistent Weighted Sampling

On the Discovery of Evolving Truth

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams

A Learning-based Framework to Handle Multi-round Multi-party Influence Maximization on Social Networks

Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework

Spectral Ensemble Clustering

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

Influence at Scale: Distributed Computation of Complex Contagion in Networks

FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation

Algorithmic Cartography: Placing Points of Interest and Ads on Maps

Dimensionality Reduction Via Graph Structure Learning

Robust Treecode Approximation for Kernel Machines

Inferring Networks of Substitutable and Complementary Products

SESSION: Research Paper Presentations (Part 2)

Data-Driven Activity Prediction: Algorithms, Evaluation Methodology, and Applications

Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

Graph Query Reformulation with Diversity

Flexible and Robust Multi-Network Clustering

Extreme States Distribution Decomposition Method for Search Engine Online Evaluation

Simultaneous Modeling of Multiple Diseases for Mortality Prediction in Acute Hospital Care

Fast and Robust Parallel SGD Matrix Factorization

Efficient PageRank Tracking in Evolving Networks

Quick Sensitivity Analysis for Incremental Data Modification and Its Application to Leave-one-out CV in Linear Classification Problems

Non-transitive Hashing with Latent Similarity Components

Optimal Kernel Group Transformation for Exploratory Regression Analysis and Graphics

Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning

Subspace Clustering Using Log-determinant Rank Approximation

A PCA-Based Change Detection Framework for Multidimensional Data Streams: Change Detection in Multidimensional Data Streams

State-Driven Dynamic Sensor Selection and Prediction with State-Stacked Sparseness

SCRAM: A Sharing Considered Route Assignment Mechanism for Fair Taxi Route Recommendations

Locally Densest Subgraph Discovery

Virus Propagation in Multiple Profile Networks

Collective Opinion Spam Detection: Bridging Review Networks and Metadata

ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages

Why It Happened: Identifying and Modeling the Reasons of the Happening of Social Events

Matrix Completion with Queries

Stochastic Divergence Minimization for Online Collapsed Variational Bayes Zero Inference of Latent Dirichlet Allocation

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

TimeCrunch: Interpretable Dynamic Graph Summarization

Inside Jokes: Identifying Humorous Cartoon Captions

Community Detection based on Distance Dynamics

Discovery of Meaningful Rules in Time Series

An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs

Efficient Latent Link Recommendation in Signed Networks

Turn Waste into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data

Set Cover at Web Scale

Exploiting Relevance Feedback in Knowledge Graph Search

LINKAGE: An Approach for Comprehensive Risk Prediction for Care Management

Transitive Transfer Learning

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

An Effective Marketing Strategy for Revenue Maximization with a Quantity Constraint

Scaling Up Stochastic Dual Coordinate Ascent

Discovering Valuable items from Massive Data

Deep Learning Architecture with Dynamically Programmed Layers for Brain Connectome Prediction

Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

Towards Interactive Construction of Topical Hierarchy: A Recursive Tensor Decomposition Approach

Collaborative Deep Learning for Recommender Systems

Trading Interpretability for Accuracy: Oblique Treed Sparse Additive Models

Geo-SAGE: A Geographical Sparse Additive Generative Model for Spatial Item Recommendation

Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

Dynamic Poisson Autoregression for Influenza-Like-Illness Case Count Prediction

Cinema Data Mining: The Smell of Fear

Predicting Winning Price in Real Time Bidding with Censored Data

Diversifying Restricted Boltzmann Machine for Document Modeling

Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier

Petuum: A New Platform for Distributed Machine Learning on Big Data

Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction

Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems

Deep Graph Kernels

Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning

Structural Graphical Lasso for Learning Mouse Brain Connectivity

Entity Matching across Heterogeneous Sources

An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints

Assembler: Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data

Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data

Organizational Chart Inference

Panther: Fast Top-k Similarity Search on Large Networks

A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation

Statistical Arbitrage Mining for Display Advertising

Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis

COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency

SAME but Different: Fast and High Quality Gibbs Parameter Estimation

Multi-Task Learning for Spatio-Temporal Event Forecasting

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

Linear Time Samplers for Supervised Topic Models using Compositional Proposals

L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

Modeling Truth Existence in Truth Discovery

Cuckoo Linear Algebra

Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis

Modeling User Mobility for Location Promotion in Location-based Social Networks

Co-Clustering based Dual Prediction for Cargo Pricing Optimization

Debiasing Crowdsourced Batches

Query Workloads for Data Series Indexes

SESSION: Industry & Government Track Invited Talks

Scaling Machine Learning and Statistics for Web Applications

Hadoop's Impact on the Future of Data Management

Should You Trust Your Money to a Robot?

Data Science at Visa

How Artificial Intelligence and Big Data Created Rocket Fuel: A Case Study

Optimizing Marketing Impact through Data Driven Decisioning

Powering Real-time Decision Engines in Finance and Healthcare using Open Source Software

Clouded Intelligence

Data Science from the Lab to the Field to the Enterprise

User Modeling in Telecommunications and Internet Industry

SESSION: Industry & Government Track Papers

The Effectiveness of Marketing Strategies in Social Media: Evidence from Promotional Events

Personalizing LinkedIn Feed

Whither Social Networks for Web Search?

Exploiting Data Mining for Authenticity Assessment and Protection of High-Quality Italian Wines from Piedmont

Predictive Approaches for Low-Cost Preventive Medicine Program in Developing Countries

Dynamic Hierarchical Classification for Patient Risk-of-Readmission

ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments

Multi-View Incident Ticket Clustering for Optimal Ticket Dispatching

Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission

User Conditional Hashtag Prediction for Images

Big Data System for Analyzing Risky Procurement Entities

Probabilistic Modeling of a Sales Funnel to Prioritize Leads

Online Topic-based Social Influence Analysis for the Wimbledon Championships

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Utilizing Text Mining on Online Medical Forums to Predict Label Change due to Adverse Drug Reactions

One-Pass Ranking Models for Low-Latency Product Recommendations

On the Reliability of Profile Matching Across Large Online Social Networks

E-commerce in Your Inbox: Product Recommendations at Scale

Gender and Interest Targeting for Sponsored Post Advertising at Tumblr

Mining Administrative Data to Spur Urban Revitalization

Measuring Causal Impact of Online Actions via Natural Experiments: Application to Display Advertising

Focusing on the Long-term: It's Good for Users and Business

Traffic Measurement and Route Recommendation System for Mass Rapid Transit (MRT)

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection

Life-stage Prediction for Product Recommendation in E-commerce

Visual Search at Pinterest

Discovering Collective Narratives of Theme Parks from Large Collections of Visitors' Photo Streams

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

Probabilistic Graphical Models of Dyslexia

Promoting Positive Post-Click Experience for In-Stream Yahoo Gemini Users

Generic and Scalable Framework for Automated Time-series Anomaly Detection

Leveraging Knowledge Bases for Contextual Entity Exploration

Click-through Prediction for Advertising in Twitter Timeline

Predicting Voice Elicited Emotions

Discovery of Glaucoma Progressive Patterns Using Hierarchical MDL-Based Clustering

Distributed Personalization

Voltage Correlations in Smart Meter Data

Analyzing Invariants in Cyber-Physical Systems using Latent Factor Regression

Predicting Future Scientific Discoveries Based on a Networked Analysis of the Past Literature

Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues

Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning

Proof Protocol for a Machine Learning Technique Making Longitudinal Predictions in Dynamic Contexts

An Architecture for Agile Machine Learning in Real-Time Applications

Scalable Machine Learning Approaches for Neighborhood Classification Using Very High Resolution Remote Sensing Imagery

Early Identification of Violent Criminal Gang Members

Spoken English Grading: Machine Learning with Crowd Intelligence

Effective Audience Extension in Online Advertising

Going In-Depth: Finding Longform on the Web

Early Prediction of Cardiac Arrest (Code Blue) using Electronic Medical Records

When-To-Post on Social Networks

Mining for Causal Relationships: A Data-Driven Study of the Islamic State

Transfer Learning for Bilingual Content Classification

FrauDetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection

Efficient Long-Term Degradation Profiling in Time Series for Complex Physical Systems

Interpreting Advertiser Intent in Sponsored Search

Client Clustering for Hiring Modeling in Work Marketplaces

Discerning Tactical Patterns for Professional Soccer Teams: An Enhanced Topic Model with Applications

Predicting Serves in Tennis using Style Priors

Smart Pacing for Effective Online Ad Campaign Optimization

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

Tornado Forecasting with Multiple Markov Boundaries

Gas Concentration Reconstruction for Coal-Fired Boilers Using Gaussian Process

Annotating Needles in the Haystack without Looking: Product Information Extraction from Emails

Forecasting Fine-Grained Air Quality Based on Big Data

Building Discriminative User Profiles for Large-scale Content Recommendation

Stock Constrained Recommendation in Tmall

Predicting Ambulance Demand: a Spatio-Temporal Kernel Approach


Web Personalization and Recommender Systems

Graph-Based User Behavior Modeling: From Prediction to Fraud Detection

Data-Driven Product Innovation

Dense Subgraph Discovery: KDD 2015 tutorial

Diffusion in Social and Information Networks: Research Problems, Probabilistic Models and Machine Learning Methods

Social Media Anomaly Detection: Challenges and Solutions

Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach

VC-Dimension and Rademacher Averages: From Statistical Learning Theory to Sampling Algorithms

Large Scale Distributed Data Science using Apache Spark

Medical Mining: KDD 2015 Tutorial

Big Data Analytics: Optimization and Randomization


Data Driven Science: SIGKDD Panel