Conventional Tutorials

We have a fantastic lineup conventional tutorials to be held in conjunction with KDD 2019. Check back as we get closer to the conference for more detailed program information.

Conventional Tutorials

  • Deep Bayesian Mining, Learning and Understanding

    This tutorial addresses the advances in deep Bayesian learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, “deep learning” is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The “semantic structure” in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The “distribution function” in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network, long short-term memory, sequence-to-sequence model, variational auto-encoder, generative adversarial network, attention mechanism, memory-augmented neural network, skip neural network, stochastic neural network, predictive state neural network, policy neural network. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies are presented to tackle different issues in deep Bayesian mining, learning and understanding. At last, we will point out a number of directions and outlooks for future studies.


    Jen-Tzung Chien (National Chiao Tung University, Taiwan)

    Tentative timeslot: PM Website
  • Gold Panning from the Mess: Rare Category Exploration, Exposition, Representation and Interpretation

    In contrast to the massive volume of data, it is often the rare categories that are of great importance in many high impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from spam image detection in social media to rare disease diagnosis in the medical decision support system. The unique challenges of rare category analysis include (1) the highly-skewed class-membership distribution; (2) the non-separability nature of the rare categories from the majority classes; (3) the data and task heterogeneity, e.g., the multi-modal representation of examples, and the analysis of similar rare categories across multiple related tasks. This tutorial aims to provide a concise review of the state-of-the-art techniques on complex rare category analysis, where the majority classes have a smooth distribution, while the minority classes exhibit a compactness property in the feature space. In particular, we start with the context, problem definition and unique challenges of complex rare category analysis; then, we present a comprehensive overview of recent advances that designed for this problem setting, from rare category exploration without any label information to the exposition step that characterize rare examples in a compact representation, from representing rare patterns in a salient embedding space to interpreting the prediction results and providing relevant clues for the end users’ investigation; at last, we will discuss the potential challenges and shed light on the future directions of complex rare category analysis.


    Dawei Zhou (UIUC)
    Jingrui He (UIUC)

    Tentative timeslot: AM Website
  • Data Mining Methods for Drug Discovery and Development

    In silico modeling of medicine refers to the direct use of computational methods in support of drug discovery and development.Machine learning and data mining methods have become an integral part of in silico modeling and demonstrated promising performance at various phases of the drug discovery and development process.In this tutorial we will introduce data analytic methods in drug discovery and development. For the first half, we will provide an overview about related data and analytic tasks, and then present the enabling data analytic methods for these tasks. For the second half, we will describe concrete applications of each of those tasks. The tutorial will be concluded with open problems and a Q&A session.


    Cao (Danica) Xiao (IQVIA)
    Jimeng Sun (Georgia Tech)

    Tentative timeslot: PM Website
  • Mining and model understanding on medical data

    Medical research and patient caretaking are increasingly benefiting from advances in machine learning. The penetration of smart technologies and the Internet of Things give a further boost to initiatives for patient self-management and empowerment: new forms of health-relevant data become available and require new data acquisition and analytics’ workflows. As data complexity and model sophistication increase, model interpretability becomes mission-critical. But what constitutes model interpretation in the context of medical machine learning: what are the questions for which KDD should provide interpretable answers? In this tutorial, we discuss basic forms of health-related data – Electronic Health Records, cohort data from population-based studies and clinical studies, mHealth recordings and data from internet-based studies. We elaborate on the questions that medical researchers and clinicians pose on those data, and on the instruments they use – giving some emphasis to the instruments “population-based study” and “Randomized Clinical Trial”. We elaborate on what questions are asked with those instruments, on what questions can be answered from those data, on ML advances and achievements on such data, and on ways of responding to the medical experts’ questions about the derived models.


    Myra Spiliopoulou (Univ Magdeburg)
    Panos Papapetrou (Univ Stockholm)

    Tentative timeslot: PM Website
  • Constructing and Mining Heterogeneous Information Networks from Massive Text

    Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform *unstructured text* into *structured knowledge*. Based on our vision, it is highly beneficial to transform such text into *structured heterogeneous information networks*, on which *actionable knowledge* can be generated based on the user’s need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct *heterogeneous information networks* from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user’s need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.


    Jingbo Shang (UIUC)
    Jiaming Shen (UIUC)
    Liyuan Liu (UIUC)
    Jiawei Han (UIUC)

    Tentative timeslot: PM Website
  • Incompleteness in Networks: Biases, Skewed Results, and Some Solutions

    Network analysis is often conducted on incomplete samples of much larger fully observed networks (which are supposed to represent some phenomena of interest). For example, many researchers obtain networks from online data repositories (such as and without knowing how the networks were collected. Such networks can be poor representations of the fully observed phenomena. More complete data would lead to more accurate analyses, but data acquisition can be at best costly and at worst error-prone (e.g., consider an adversary that deliberately poisons the answer to a query). Past work on this topic has forked into two branches: (1) assume a network model and use the observed data to infer the unobserved data; and (2) do not assume a network model, instead use the observed data to learn a policy for data acquisition given a query budget. We focus on the second branch. That is, given a query budget for identifying additional nodes and edges, how can one improve the observed network sample so that it is a more accurate representation of the fully observed network? This problem is related to, but distinct from, topics such as graph sampling and crawling. In this tutorial, we will discuss latent biases in incomplete networks and present methods for enriching such networks through active probing of nodes and edges. We will focus on multi-armed bandit, online active learning, and Markov decision process formulations of this problem (a.k.a. the network discovery problem); and clarify distinctions between learning to grow the network (a.k.a. active exploration) and learning the “best” function on the network (a.k.a. active learning).


    Tina Eliassi-Rad (Northeastern University)
    Rajmonda Caceres (MIT Lincoln Laboratory)
    Timothy LaRock (Northeastern University)

    Tentative timeslot: PM Website
  • Optimize the Wisdom of the Crowd: Inference, Learning, and Teaching

    The increasing need for labeled data has brought the booming growth of crowdsourcing in a wide range of high-impact real-world applications, such as collaborative knowledge (e.g., data annotations, language translations), collective creativity (e.g., analogy mining, crowdfunding), and reverse Turing test (e.g., CAPTCHA-like systems), etc. In the context of supervised learning, crowdsourcing refers to the annotation process where the data items are outsourced and processed by a group of mostly unskilled online workers. Thus, the researchers or the organizations are able to collect large amount of information via the feedback of the crowd in a short time with a low cost.

    Despite the wide adoption of crowdsourcing services, several of its fundamental problems remain unsolved especially at the information and cognitive levels with respect to incentive design, information aggregation, and heterogeneous learning. This tutorial aims to: (1) provide a comprehensive review of recent advances in exploring the power of crowdsourcing from the perspective of optimizing the wisdom of the crowd; and (2) identify the open challenges and provide insights to the future trends in the context of human-in-the-loop learning. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.
    Comparing with the previous offering of the tutorials and workshops regarding crowdsourcing, the emphasis of this tutorial will be placed on various aspects including: (1) the history and recent emerging techniques on addressing the truth inference problems of crowdsourcing (i.e., inferring the ground truth labels of the crowdsourced items); (2) active learning with imperfect oracles (i.e., answering the question of which item should be labeled next and which oracle to be queried) and heterogeneous learning with multiple labelers (i.e., multi-task learning and multi-view learning with crowdsourced labels); (3) supervising the crowd workers to learn and label in the form of teaching (i.e., teaching the crowdsourcing workers a concept such as labeling an image or categorizing a document).


    Yao Zhou (UIUC)
    Fenglong Ma (PSU)
    Jing Gao (U Buffalo)
    Jingrui He (UIUC)

    Tentative timeslot: PM Website
  • Interpretable knowledge Discovery Reinforced by Visual Methods

    This tutorial will cover the state-of-the-art research, development, and applications in the KDD area of interpretable knowledge discovery reinforced by visual methods to stimulate and facilitate future work. It will serve the KDD mission of gaining insight from data. The topic is interdisciplinary bridging scientific research and applied communities in KDD, Visual Analytics, Information Visualization, and HCI. This is a novel and fast growing area with significant applications, and potential.


    Boris Kovalerchuk (CWU)

    Tentative timeslot: PM Website
  • Social User Interest Mining: Methods and Applications

    The abundance of user generated content on social networks provides the opportunity to build models that are able to accurately and effectively extract, mine and predict users’ interests with the hopes of enabling more effective user engagement, better quality delivery of appropriate services and higher user satisfaction. While traditional methods for building user profiles relied on AI-based preference elicitation techniques that could have been considered to be intrusive and undesirable by the users, more recent advances are focused on a non-intrusive yet accurate way of determining users’ interests and preferences. In this tutorial, we will cover five important aspects related to the effective mining of user interests:

    1) The information sources that are used for extracting user interests
    2) Various types of user interest profiles that have been proposed in the literature
    3) Techniques that have been adopted or proposed for mining user interests
    4) The scalability and resource requirements of the state of the art methods
    5) The evaluation methodologies that are adopted in the literature for validating the appropriateness of the mined user interest profiles. We will also introduce existing challenges, open research question and exciting opportunities for further work.


    Fattane Zarrinkalam (Ryerson University)
    Hossein Fani (University of New Brunswick)
    Ebrahim Bagheri (Ryerson University)

    Tentative timeslot: AM Website
  • Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned

    Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a “fairness by design” approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice, by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.


    Sarah Bird (Microsoft, USA)
    Ben Hutchinson (Google, USA)
    Krishnaram Kenthapadi (LinkedIn, USA)
    Emre Kıcıman (Microsoft, USA)
    Margaret Mitchell (Google, USA)

    Tentative timeslot: AM Website
  • Explainable AI in Industry

    Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.

    As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.

    In this tutorial, we will present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we will focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as hiring, sales, lending, and fraud detection. Finally, based on our experiences in industry, we will identify open problems and research directions for the data mining / machine learning community.


    Krishna Gade (Fiddler Labs, USA)
    Sahin Cem Geyik (LinkedIn, USA)
    Krishnaram Kenthapadi (LinkedIn, USA)
    Varun Mithal (LinkedIn, USA)

    Tentative timeslot: PM Website
  • Foundations of large-scale sequential experimentation

    Large-scale sequential hypothesis testing (A/B-testing) is rampant in the tech industry, with internet companies running hundreds of thousands of tests per year. About 6 years ago, Microsoft claimed that such experiments on Bing increased revenues in the order of hundreds of millions of dollars (Kohavi et al., 2013), and even 9 years ago, Google claimed that such experimentation was basically a mantra (Tang et al., 2010). This experimentation is actually “doubly-sequential”, since it consists of a sequence of sequential experiments.

    In this tutorial, the audience will learn about the various problems encountered in large-scale, asynchronous, doubly-sequential experimentation, both for the inner sequential process (a single sequential test) and for the outer sequential process (the sequence of tests), and learn about recently developed principles to tackle these problems. We will discuss error metrics both within and across experiments, and present state-of-the-art methods that provably control these errors, both with and without resorting to parametric or asymptotic assumptions. In particular, we will demonstrate how current common practices of peeking and marginal testing fail to control errors both within and across experiments, but how these can be alleviated using simple yet nuanced changes to the experimentation setup. We will also briefly discuss the role of multi-armed bandit methods for testing hypotheses, and the potential pitfalls due to selection bias introduced by adaptive sampling.


    Aaditya Ramdas (Carnegie Mellon University)

    Tentative timeslot: PM Website
  • Advances in Cost-sensitive Multiclass and Multilabel Classification

    Classification is an important problem for data mining and knowledge discovery. Traditionally, the regular classification problem aims at minimizing the error rate of mis-prediction. Nevertheless, many real-world data mining applications require varying costs for different types of mis-classification errors. For instance, mis-classifying a Gram-positive bacteria as a Gram-negative one leads to totally ineffective treatments and is hence more serious than mis-classifying a Gram-positive bacteria as another Gram-positive one. Such a cost-sensitive classification problem can be very different from the regular classification one, and can be used by applications like targeted marketing, information retrieval, medical decision making, object recognition and intrusion detection.

    Cost-sensitive binary classification problem has been studied since the 90s, resulting in sampling and re-weighting tools that continue to influence many real-world applications. In the past 20 years, researchers have advanced those tools to tackle more complicated problems, including multiclass and multilabel classification ones. The tutorial aims to review and summarize those advances to allow more real-world applications to enjoy the benefits of cost-sensitive classification. The advances range from the Bayesian approaches that consider costs during inference, to reduction-based approaches that transform the cost-sensitive classification task to other tasks, to deep learning approaches that plug the costs into the optimization and feature-extraction process. We discuss the relationship between the approaches as well as their practical usage. We will also introduce some success in data mining applications, such as improving the performance of a real-world bacteria classification system and tackling the class-imbalance problem of KDDCup 1999.


    Hsuan-Tien Lin (National Taiwan University)

    Tentative timeslot: PM Website
  • Modern MDL meets Data Mining -- Insights, Theory, and Practice

    When considering a data set it is often unknown how it is, and hence how complex a model that describes or captures its main characteristics should be. Often these choices are swept under the carpet, ignored, left to the domain expert, but in practice this is highly unsatisfactory; domain experts do not know how to set $k$, what prior to choose, or how many degrees of freedom is optimal any more than we do.

    The Minimum Description Length~(MDL) principle can answer the model selection problem from a clear and intuitively appealing viewpoint. In a nutshell, it asserts that the best model is the one that best compresses both the data and that model. In this tutorial we do not only give an introduction to the very basics of model selection, show important properties of MDL-based modelling, successful examples as well as pitfalls for how to apply MDL to solve data mining problems, but also introduce advanced topics on important new concepts in modern MDL (e.g, normalized maximum likelihood (NML), sequential NML, decomposed NML, and MDL change statistics) and emerging applications in dynamic settings. Throughout this tutorial, our goal is to make sure that audience does not only grasp the basic theory, but also see how this can be put to practice.


    Jilles Vreeken (Helmholtz Center for Information Security)
    Kenji Yamanishi (U. Tokyo)

    Tentative timeslot: AM Website
  • Adaptive Influence Maximization

    Information diffusion and social influence are more and more present in today’s Web ecosystem. Having algorithms that optimize the presence and message diffusion on social media is indeed crucial to all actors (media companies, political parties, corporations, etc.) who advertise on the Web. Motivated by the need for effective viral marketing strategies, influence estimation and influence maximization have therefore become important research problems, leading to a plethora of methods. However, the majority of these methods are non-adaptive, and therefore not appropriate for scenarios in which influence campaigns may be ran and observed over multiple rounds, nor for scenarios which cannot assume full knowledge over the diffusion networks and the ways information spreads in them.

    In this tutorial we intend to present the recent research on adaptive influence maximization, which aims to address these limitations. This can be seen as a particular case of the influence maximization problem (where seeds in a social graph are selected to maximize information spread), one in which the decisions are taken as the influence campaign unfolds, over multiple rounds, and where knowledge about the graph topology and the influence process may be partial or even entirely missing. This setting, depending on the underlying assumptions, leads to variate and original approaches and algorithmic techniques, as we have witnessed in recent literature. We will review the most relevant research in this area, by organizing it along several key dimensions, and by discussing the methods’ advantages and shortcomings, along with open research questions and the practical aspects of their implementation.


    Bogdan Cautis (U Paris-Sud, France)
    Silviu Maniu (U Paris-Sud, France)
    Nikolaos Tziortziotis (Tradelab and U Paris-Sud, France)

    Tentative timeslot: AM Website
  • Hypothesis Testing and Statistically-sound Pattern Mining

    The availability of massive datasets has highlighted the need of computationally efficient and statistically-sound methods to extracts patterns while providing rigorous guarantees on the quality of the results, in particular with respect to false discoveries.

    In this tutorial we survey recent methods that properly combine computational and statistical considerations to efficiently mine statistically reliable patterns from large datasets. We start by introducing the fundamental concepts in statistical hypothesis testing, which may not be familiar to everyone in the data mining community. We then explain how the computational and statistical challenges in pattern mining have been tackled in different ways. Finally, we describe the application of these methods in areas such as market basket analysis, subgraph mining, social networks analysis, and cancer genomics.

    The purpose of this tutorial is to introduce the audience to statistical hypothesis testing, to emphasize the importance of properly balancing the computational and statistical aspects of pattern mining, highlighting the usefulness of techniques that do so for the data mining researcher, and also to encourage further research in this area.


    Leonardo Pellegrina (University of Padova, Italy)
    Matteo Riondato (Amherst College, USA)
    Fabio Vandin (University of Padova, Italy)

    Tentative timeslot: AM Website
  • Fake News Research: Theories, Detection Strategies, and Open Problems

    The explosive growth of fake news and its erosion to democracy, justice, and public trust increased the demand for fake news detection. As an interdisciplinary topic, the study of fake news encourages a concerted effort of experts in computer and information science, political science, journalism, social science, psychology, and economics. A comprehensive framework to systematically understand and detect fake news is necessary to attract and unite researchers in related areas to conduct research on fake news. This tutorial aims to clearly present (1) fake news research, its challenges, and research directions; (2) a comparison between fake news and other related concepts (e.g., rumors); (3) the fundamental theories developed across various disciplines that facilitate interdisciplinary research; (4) various detection strategies unified under a comprehensive framework for fake news detection; and (5) the state-of-the-art datasets, patterns, and models. We present fake news detection from various perspectives, which involve news content and information in social networks, and broadly adopt techniques in data mining, machine learning, natural language processing, information retrieval and social search. Facing the upcoming 2020 U.S. presidential election, challenges for automatic, effective and efficient fake news detection are also clarified in this tutorial.


    Reza Zafarani (Syracuse University)
    Xinyi Zhou (Syracuse University)
    Kai Shu (Arizona State University)
    Huan Liu (Arizona State University)

    Tentative timeslot: AM Website
  • Recent Progress in Zeroth Order Optimization and Its Applications to Adversarial Robustness in Data Mining and Machine Learning

    Zeroth-order (ZO) optimization is increasingly embraced for solving big data and machine learning problems when explicit expressions of the gradients are difficult or infeasible to obtain. It achieves gradient-free optimization by approximating the full gradient via efficient gradient estimators. Some recent important applications include: a) generation of prediction-evasive, black-box adversarial attacks on deep neural networks, b) online network management with limited computation capacity, c) parameter inference of black-box/complex systems, and d) bandit optimization in which a player receives partial feedback in terms of loss function values revealed by her adversary.

    This tutorial aims to provide a comprehensive introduction to recent advances in ZO optimization methods in both theory and applications. On the theory side, we will cover convergence rate and iteration complexity analysis of ZO algorithms and make comparisons to their first-order counterparts. On the application side, we will highlight one appealing application of ZO optimization to studying the robustness of deep neural networks - practical and efficient adversarial attacks that generate adversarial examples from a black-box machine learning model. We will also summarize potential research directions regarding ZO optimization, big data challenges and some open-ended data mining and machine learning problems.


    Pin-Yu Chen (IBM Research)
    Sijia Liu (IBM Research)

    Tentative timeslot: PM Website
  • Forecasting Big Time Series: Theory and Practice

    Time series forecasting is a key ingredient in the automation and optimization of business processes: in retail, deciding which products to order and where to store them depends on the forecasts of future demand in different regions; in cloud computing, the estimated future usage of services and infras- tructure components guides capacity planning; and work- force scheduling in warehouses and factories requires fore- casts of the future workload. Recent years have witnessed a paradigm shift in forecasting techniques and applications, from computer-assisted model- and assumption-based to data-driven and fully-automated. This shift can be attributed to the availability of large, rich, and diverse time series data sources and result in a set of challenges that need to be ad- dressed such as the following. How can we build statistical models to efficiently and effectively learn to forecast from large and diverse data sources? How can we leverage the sta- tistical power of “similar” time series to improve forecasts in the case of limited observations? What are the implications for building forecasting systems that can handle large data volumes?

    The objective of this tutorial is to provide a concise and intuitive overview of the most important methods and tools available for solving large-scale forecasting problems. We review the state of the art in three related fields: (1) classical modeling of time series, (2) modern methods including ten- sor analysis and deep learning for forecasting. Furthermore, we discuss the practical aspects of building a large scale forecasting system, including data integration, feature gen- eration, backtest framework, error tracking and analysis, etc. While our focus is on providing an intuitive overview of the methods and practical issues which we will illustrate via case studies and interactive materials with Jupyter notebooks.


    Christos Faloutsos (CMU and Amazon)
    Valentin Flunkert (AWS AI Labs)
    Jan Gasthaus (AWS AI Labs)
    Tim Januschowski (AWS AI Labs)
    Yuyang (Bernie) Wang (AWS AI Labs)

    Tentative timeslot: PM Website
  • Deep Natural Language Processing for Search and Recommender Systems

    Search and recommender systems, involving various offline and online components, are becoming increasingly complex. The two systems share many fundamental components such as language understanding for query or documents, retrieval and ranking for documents or items, and language generation for interacting with users. Natural language text data, such as queries, user profiles, and documents, are the data mostly present in both systems. Thus, building powerful search and recommender systems inevitably requires processing and understanding natural language effectively and efficiently. Recent rapid growth of deep learning technologies has presented both opportunities and challenges in this area. This tutorial offers an overview of deep learning based natural language processing for search and recommender systems from an industry perspective. We focus on how deep natural language processing powers search and recommender systems in practice. The tutorial first introduces deep learning based natural language processing technologies, including language understanding and language generation. Then it details how those technologies can be applied to common tasks in search and recommender systems, including query and document understanding, retrieval and ranking, and language generation. Applications in LinkedIn production systems are presented as concrete examples where practical challenges are discussed. The tutorial concludes with discussion of future trend in both systems.


    Weiwei Guo (LinkedIn)
    Huiji Gao (LinkedIn)
    Jun Shi (LinkedIn)
    Bo Long (LinkedIn)
    Liang Zhang (LinkedIn)
    Bee-Chung Chen (LinkedIn)
    Deepak Agarwal (LinkedIn)

    Tentative timeslot: PM Website
  • Spatio-temporal event forecasting and precursor identification

    Spatio-temporal societal event forecasting, which has traditionally been prohibitively challenging, is now becoming possible and experiencing rapid growth thanks to the big data from Open Source Indicators (OSI) such as social media, news sources, blogs, economic indicators, and other meta-data sources. Spatio-temporal societal event forecasting and their precursor discovery benefit the society by providing insight into events such as political crises, humanitarian crises, mass violence, riots, mass migrations, disease outbreaks, economic instability, resource shortages, natural disasters, and others.

    In contrast to traditional event detection that identifies ongoing events, event forecasting focuses on predicting future events yet to happen. Also different from traditional spatio-temporal predictions on numerical indices, spatio-temporal event forecasting needs to leverage the heterogeneous information from OSI to discover the predictive indicators and mappings to future societal events. While studying large scale societal events, policy makers and practitioners aim to identify precursors to such events to help understand causative attributes and ensure accountability. The resulting problems typically require the predictive modeling techniques that can jointly handle semantic, temporal, and spatial information, and require a design of efficient and interpretable algorithms that scale to high-dimensional large real-world datasets.
    In this tutorial, we will present a comprehensive review of the state-of-the-art methods for spatio-temporal societal event forecasting. First, we will categorize the inputs OSI and the predicted societal events commonly researched in the literature. Then we will review methods for temporal and spatio-temporal societal event forecasting. Next, we will also discuss the foundations of precursor identification with an introduction to various machine learning approaches that aim to discover precursors while forecasting events. Through the tutorial, we expect to illustrate the basic theoretical and algorithmic ideas and discuss specific applications in all the above settings.


    Yue Ning (Stevens Institute of Techonolgy)
    Liang Zhao (George Mason University)
    Feng Chen (SUNY, Albany)
    Chang-tien Lu (Virginia Tech)
    Huzefa Rangwala (George Mason Univeristy)

    Tentative timeslot: PM Website
  • Are You My Neighbor? Bringing Order to Neighbor Computing Problems

    Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, advertising, community detection, ligand-based virtual screening, etc. As data are being easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons.

    In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms as well as the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of “neighbor computing” problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems.


    David C. Anastasiu (San Jose State University)
    Huzefa Rangwala (George Mason Univeristy)
    Andrea Tagarelli (University of Calabria, Italy)

    Tentative timeslot: AM Website
  • Learning From Networks: Algorithms, Theory, & Applications

    Arguably, every entity in this universe is networked in one way or another. With the prevalence of network data collected, such as social media and biological networks, learning from networks has become an essential task in many applications. It is well recognized that network data is intricate and large-scale, and analytic tasks on network data become more and more sophisticated. In this tutorial, we systematically review the area of learning from networks, including algorithms, theoretical analysis, and illustrative applications. Starting with a quick recollection of the exciting history of the area, we formulate the core technical problems. Then, we introduce the fundamental approaches, that is, the feature selection based approaches and the network embedding based approaches. Next, we extend our discussion to attributed networks, which are popular in practice. Last, we cover the latest hot topic, graph neural based approaches. For each group of approaches, we also survey the associated theoretical analysis and real-world application examples. Our tutorial also inspires a series of open problems and challenges that may lead to future breakthroughs. The authors are productive and seasoned researchers active in this area who represent a nice combination of academia and industry.


    Xiao Huang (Texas A&M)
    Peng Cui (Tsinghua)
    Yuxiao Dong (Microsoft)
    Jundong Li (Arizona State/University of Virginia)
    Huan Liu (Arizona State)
    Jian Pei (Simon Fraser University)
    Le Song (Georgia Institute of Technology)
    Jie Tang (Tsinghua), Fei Wang (Cornell University)
    Hongxia Yang (Alibaba)
    Wenwu Zhu (Tsinghua)

    Tentative timeslot: All day Website
  • Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks

    The tutorial will review recent developments in using techniques from statistical mechanics to understand the properties of modern deep neural networks. Although there have long been connections between statistical mechanics and neural networks, in recent decades connections have withered. In light of recent failings of traditional statistical learning theory and stochastic optimization theory even to qualitatively describe many properties of production quality deep neural network models, researchers have revisited ideas from the statistical mechanics of neural networks. The tutorial will provide an overview of the area; it will go into detail on how connections with heavy tailed random matrix theory can lead to a practical phenomenological theory for large-scale deep neural networks; and it will describe future directions.


    Charles Martin (Calculation Consulting)
    Michael Mahoney (UC Berkeley)

    Tentative timeslot: PM Website
  • Mining temporal networks

    Networks (or graphs) are used to represent and analyze large datasets of objects and their relations. Typical examples of graph applications come from social networks, traffic networks, electric power grids, road systems, the Internet, chemical and biological systems, and more. Naturally, real-world networks have a temporal component: for instance, interactions between objects have a timestamp and a duration. In this tutorial we present models and algorithms for mining temporal networks, i.e., network data with temporal information. We overview different models used to represent temporal networks. We highlight the main differences between static and temporal networks, and discuss the challenges arising from introducing the temporal dimension in the network representation. We present recent papers addressing the most well-studied problems in the setting of temporal networks, including computation of centrality measures, motif detection and counting, community detection and monitoring, event and anomaly detection, analysis of epidemic processes and influence spreading, network summarization, and structure prediction. Finally, we discuss some of the current challenges and open problems in the area, and we highlight directions for future work.


    Aristides Gionis (Aalto University)
    Polina Rozenshtein (Nordea Data Science Lab)

    Tentative timeslot: AM Website
  • Data Integration and Machine Learning: A Natural Synergy

    There is now more data to analyze than ever before. As data volume and variety have increased, so have the ties between machine learning and data integration become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning.


    Xin Luna Dong (Amazon)
    Theodoros Rekatsinas (University of Wisconsin-Madison)

    Tentative timeslot: PM Website
  • Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

    A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex changes like using machine learning models to personalize user experience. The key aspect of A/B testing is evaluation of experiment results. Designing the right set of metrics - correct outcome measures, data quality indicators, guardrails that prevent harm to business, and a comprehensive set of supporting metrics to understand the “why” behind the key movements is the #1 challenge practitioners face when trying to scale their experimentation program. On the technical side, improving sensitivity of experiment metrics is a hard problem and an active research area, with large practical implications as more and more small and medium size businesses are trying to adopt A/B testing and suffer from insufficient power. In this tutorial we will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.


    Xiaolin Shi (Snap Inc)
    Pavel Dmitriev (Outreach)
    Somit Gupta (Microsoft)
    Xin Fu (Facebook)

    Tentative timeslot: AM Website
  • Modeling and Applications for Temporal Point Processes

    Real-world entities’ behaviors, associated with their side information, are often recorded over time as asynchronous event sequences. Such event sequences are the basis of many practical applications, neural spiking train study, earth quack prediction, crime analysis, infectious disease diffusion forecasting, condition-based preventative maintenance, information retrieval and behavior-based network analysis and services, etc. Temporal point process (TPP) is a principled mathematical tool for the modeling and learning of asynchronous event sequences, which captures the instantaneous happening rate of the events and the temporal dependency between historical and current events. TPP provides us with an interpretable model to describe the generative mechanism of event sequences, which is beneficial for event prediction and causality analysis. Recently, it has been shown that TPP has potentials to many machine learning and data science applications and can be combined with other cutting-edge machine learning techniques like deep learning, reinforcement learning, adversarial learning, and so on.


    Junchi Yan (SJTU)
    Hongteng Xu (Duke & Infinia ML, Inc)
    Liangda Li (Yahoo Research)

    Tentative timeslot: AM Website
  • Deep Reinforcement Learning with Applications in Transportation

    Transportation, particularly the mobile ride-sharing domain has a number of traditionally challenging dynamic decision problems that have long threads of research literature and readily stand to benefit tremendously from artificial intelligence (AI). Some core examples include online ride order dispatching, which matches available drivers to trip requesting passengers on a ride-sharing platform in real-time; route planning, which plans the best route between the origin and destination of a trip; and traffic signals control, which dynamically and adaptively adjusts the traffic signals within a region to achieve low delays. All of these problems have a common characteristic that a sequence of decisions is to be made while we care about some cumulative objectives over a certain horizon. Reinforcement learning (RL) is a machine learning paradigm that trains an agent to learn to take optimal actions (as measured by the total cumulative reward achieved) in an environment through interactions with it and getting feedback signals. It is thus a class of optimization methods for solving sequential decision-making problems.


    Zhiwei (Tony) Qin (Didi Chuxing)
    Jian Tang (Didi Chuxing & Syracuse University)
    Jieping Ye (Didi Chuxing & University of Michigan, Ann Arbor)

    Tentative timeslot: PM Website

Diamond Sponsors

Platinum Sponsors

Silver Sponsors

Bronze Sponsors

Sponsor KDD 2019

How can we assist you?

We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!

Please enter the word you see in the image below: