To attend all the talks via Zoom, find "Deep Learning Day" in the "Breakout Sessions", and join the "Chat" for any of the sessions. Videos are only available for some of the talks.
Deep Learning Day at KDD 2020
The impact of deep learning in data science has been nothing short of transformative. Powered by the surge in modern computation capabilities, widespread data availability, and advances in coding frameworks, deep neural networks are now ubiquitous. Deep methods yield state-of-the-art performance in many domains (computer vision, speech recognition and generation, natural language processing), and are still widening their lead as more research appears daily. As the field has matured, it has witnessed a lot of theoretical and practical advances. Recently, there has been a shift towards more rigorous and robust experiments, and more interpretable and transparent theory and models that can further help understand the great empirical successes that a great deal of real-world applications have been enjoying.
At KDD 2020, Deep Learning Day is a plenary event that is dedicated to providing a clear, wide overview of recent developments in deep learning. This year’s focus is on “Beyond Supervised Learning” with four theme areas: causality, transfer learning, graph mining, and reinforcement learning. The day will include plenary keynotes given by thought leaders and research highlights by rising stars in each theme area.
On behalf of the Deep Learning Day and KDD 2020 organizing committee, we welcome you all to attend this event!
Pacific Time, Monday, 24th August, 2020
8:00 - 8:45 Joelle Pineau
8:45 - 9:30 Csaba Szepesvari
9:30 - 9:45 Research highlight: Ashley Edwards
9:45 - 10:15 Live QA
10:15 - 11:00 Uri Shalit
11:00 - 11:45 Susan Athey
11:45 - 12:00 Research highlight: Martin Arjovsky
11:45 - 12:30 Live QA
12:30 - 1:15 Hugo Larochelle
1:15 - 2:00 Shai Ben-David
2:00 - 2:15 Research highlight: Mandar Joshi
2:15 - 2:45 Live QA
2:45 - 3:30 Yizhou Sun
3:30 - 4:15 Will Hamilton
4:15 - 4:30 Research highlight: Weihua Hu
4:30 - 5:00 Live QA
Plenary Keynote Speaker
Bio: Susan Athey’s research is in the areas of industrial organization, microeconomic theory, and applied econometrics. Her current research focuses on the design of auction-based marketplaces and the economics of the internet, primarily on online advertising and the economics of the news media. She has also studied dynamic mechanisms and games with incomplete information, comparative statics under uncertainty, and econometric methods for analyzing auction models
Bio: Uri Shalit is a senior lecturer (assistant professor) at the Technion - Israel Institute of Technology, Faculty of Industrial Engineering and Management, in the areas of statistics and information systems. Uri's research is focused on two subjects: The first is applying machine learning to the field of healthcare, especially in terms of providing physicians with decision support tools based on big health data. The second is the intersection of machine learning and causal inference, with a focus on learning individual-level effects. Previously, Uri was a postdoctoral researcher in Prof. David Sontag’s Clinical Machine Learning Lab in NYU and then MIT. He completed his PhD studies at the Center for Neural Computation at The Hebrew University of Jerusalem. From 2011 to 2014 he was a recipient of Google's European Fellowship in Machine Learning.
Meta-Learning - A Roadmap for Few-Shot Transfer Learning
A lot of the recent progress on many AI tasks were enabled in part by the availability of large quantities of labeled data for deep learning. Yet, humans are able to learn new concepts or tasks from as little as a handful of examples. Meta-learning has been a promising framework for addressing the problem of generalizing from small amounts of data, known as few-shot learning. In this talk, I’ll present an overview of the state of this research area. I'll describe Meta-Dataset, a new benchmark we developed to push further the development of few-shot learning methods towards a more realistic multi-domain setting. Notably, I'll present the Universal Representation Transformer (URT) layer, that meta-learns to leverage universal multi-domain features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations. The URT layer allows us to reach state-of-the-art performance on Meta-Dataset. I'll end by discussing my perspective on promising future directions.
Bio: Hugo Larochelle is a Research Scientist at Google Brain and lead of the Montreal Google Brain team. He is also a member of Yoshua Bengio's Mila and an Adjunct Professor at the Université de Montréal. Previously, he was an Associate Professor at the University of Sherbrooke. Larochelle also co-founded Whetlab, which was acquired in 2015 by Twitter, where he then worked as a Research Scientist in the Twitter Cortex group. From 2009 to 2011, he was also a member of the machine learning group at the University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. He obtained his Ph.D. at the Université de Montréal, under the supervision of Yoshua Bengio. Finally, he has a popular online course on deep learning and neural networks, freely accessible on YouTube.
Bio: Shai Ben-David grew up in Jerusalem, Israel. He attended the Hebrew University studying physics, mathematics and psychology. He received his PhD under the supervision of Saharon Shelah and Menachem Magidor for a thesis in set theory. Professor Ben-David was a postdoctoral fellow at the University of Toronto in the Mathematics and the Computer Science departments, and in 1987 joined the faculty of the CS Department at the Technion (Israel Institute of Technology). He held visiting faculty positions at the Australian National University in Canberra (1997-8) and at Cornell University (2001-2004). In August 2004 he joined the School of Computer Science at the University of Waterloo.
Beyond Node Classification: Exploit the Power of GNNs for Different Graph Tasks
Graph neural networks (GNNs) have received more and more attention in past several years, due to the wide applications of graphs and networks, and the superiority of their performance compared to traditional heuristics-driven approaches. Most existing GNNs focus on solving node-level applications, such as node classification and link prediction. However, there exist many challenging graph tasks such as network dismantling and graph search, which are NP-hard problems by nature. In this talk, I will introduce our recent progress on designing GNNs for hard graph tasks. In particular, we will examine three challenging tasks that are key to the success of mining real-world networks and graphs: (1) How to address heterogeneous and dynamic-evolving networks, and how to transfer learned knowledge from some portion of a graph to other portions? (2) How to predict node properties that are related to the whole graph, and how to conduct node selection based on a graph-level criterion? And (3) how to design graph-level GNNs to facilitate the comparisons between graphs, and how to conduct efficient graph search in a graph database? In the end, we will provide some discussions to the open questions in the field.
Bio: Yizhou Sun is an associate professor at department of computer science of UCLA. Prior to that, she was an assistant professor in the College of Computer and Information Science of Northeastern University. She received her Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2012. Her principal research interest is on mining graphs/networks, and more generally in data mining, machine learning, and network science, with a focus on modeling novel problems and proposing scalable algorithms for large-scale, real-world applications. She is a pioneer researcher in mining heterogeneous information network, with a recent focus on deep learning on graphs/networks. Yizhou has over 100 publications in books, journals, and major conferences. Tutorials of her research have been given in many premier conferences. She received 2012 ACM SIGKDD Best Student Paper Award, 2013 ACM SIGKDD Doctoral Dissertation Award, 2013 Yahoo ACE (Academic Career Enhancement) Award, 2015 NSF CAREER Award, 2016 CS@ILLINOIS Distinguished Educator Award, 2018 Amazon Research Award, and 2019 Okawa Foundation Research Grant.
Graph Representation Learning: Recent Advances and Open Challenges
Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial if we want systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, most prominently in the development of graph neural networks (GNNs). Advances in GNNs have led to state-of-the-art results in numerous domains, including chemical synthesis, 3D-vision, recommender systems, question answering, and social network analysis. In the first part of this talk I will provide an overview and summary of recent progress in this fast-growing area, highlighting foundational methods and theoretical motivations. In the second part of this talk I will discuss fundamental limitations of the current GNN paradigm and propose open challenges for the theoretical advancement of the field.
Bio: William L. Hamilton is an Assistant Professor of Computer Science at McGill University and a Canada CIFAR AI Chair at Mila - The Quebec AI Institute. His research focuses on graph representation learning, as well as applications in computational social science and biology. In recent years, he has published numerous influential papers on graph representation learning at top-tier venues across machine learning and network science, and his work on the subject has received over 3000 citations since 2017. William received the 2018 Arthur L. Samuel Thesis Award for the best Ph.D. thesis in the Computer Science department at Stanford University as well as the 2017 Cozzarelli Best Paper Award from PNAS.
Bio: Joelle Pineau is a faculty member at Mila and an Associate Professor and William Dawson Scholar at the School of Computer Science at McGill University, where she co-directs the Reasoning and Learning Lab. She is also co-Managing Director of Facebook AI Research, and the director of its lab in Montreal, Canada. She holds a BASc in Engineering from the University of Waterloo, and an MSc and PhD in Robotics from Carnegie Mellon University. Dr. Pineau's research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is Past-President of the International Machine Learning Society. She is a recipient of NSERC's E.W.R. Steacie Memorial Fellowship (2018), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Senior Fellow of the Canadian Institute for Advanced Research (CIFAR), a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada, and a 2019 recipient of the Governor General's Innovation Awards.
Myths and Misconceptions in Reinforcement Learning
Reinforcement Learning and its deep variant are rapidly gaining popularity. As it is well known, any sufficiently large group of people will have the natural tendency to create a system of beliefs which may or may not be correct. The field of reinforcement learning is not immune to this either. For this talk, I have collected a number of myths and misconceptions related to reinforcement learning, which I believe are widespread enough to make it worthy to examine and discuss them. For each of the myths and misconceptions, I will talk about their origin, why I think they are untenable and how I think about the underlying issues.
Bio: Csaba Szepesvari is a Canada CIFAR AI Chair, the team-lead for the "Foundations" team at DeepMind and a Professor of Computing Science at the University of Alberta. He earned his PhD in 1999 from Jozsef Attila University, in Szeged, Hungary. He has authored three books and about 200 peer-reviewed journal and conference papers. He serves as the action editor of the Journal of Machine Learning Research and Machine Learning, as well as on various program committees. Dr. Szepesvari's interest is artificial intelligence (AI) and, in particular, principled approaches to AI that use machine learning. He is the co-inventor of UCT, a widely successful Monte-Carlo tree search algorithm. UCT ignited much work in AI, such as DeepMind's AlphaGo which defeated the top Go professional Lee Sedol in a landmark game. This work on UCT won the 2016 test-of-time award at ECML/PKDD.
Plenary Invited Speaker
What is a spurious correlation?
The talk will center on the deep relationship between out of distribution generalization, causality, and invariant correlations. From the study of this relationship, we introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.
Bio: I'm Martin Arjovsky, I'm currently a postdoc in ENS working with Francis Bach. Last year I finished my PhD at New York University, being advised by Léon Bottou. I did my undergraduate and master's in the University of Buenos Aires, Argentina (my home country). In the middle I took a year off to do internships in different places (Google, Facebook, Microsoft, and the Université de Montréal). My master's thesis advisor was Yoshua Bengio, who also advised me during my stay at UdeM. In general I'm interested in the intersection between learning and mathematics, how we can ground the different learning processes that are needed in different problems, and leverage this knowledge to develop better algorithms.
Efficient Scalable Pre-training for Natural Language Processing
Self-supervised pre-trained representations have been the single biggest driver of empirical progress on a wide variety of NLP tasks ranging from text classification to question answering. These models--often named after Sesame Street characters--are self-supervised in that they are pre-trained using supervision that can be obtained from unlabeled text (e.g., by reconstructing noised text). The self-supervised learning paradigm, coupled with advances in GPU technology, has allowed models to scale to billions of parameters. In this talk, I will discuss two research directions: (a) SpanBERT: designing resource-efficient pre-training tasks amenable to scaling, and (b) RoBERTa: a recipe for large-scale pre-training in terms of data, parameters, and computation. Finally, I will highlight some latest trends--few shot learning and non-parametric memories--and speculate on future research directions.
Bio: Mandar Joshi is a PhD candidate at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. He is advised by Luke Zettlemoyer and Daniel Weld. His research focuses on natural language processing, specifically pre-training text representations that encode linguistic and factual knowledge using large corpora. Before joining the UW, he obtained his Masters degree at the Indian Institute of Technology (IIT) Bombay, and spent an year at IBM Research, Bangalore.
Advances in Graph Neural Networks: Expressive Power, Pre-training, and Open Graph Benchmark
Machine learning on graphs, especially with Graph Neural Networks (GNNs), is an emerging field of research with diverse application domains. In this talk, I will first present our theoretical and methodological advances in GNNs, analysing the expressive power of GNNs and proposing their effective pre-training strategies. Next, I aim to address the issue that the field is lacking appropriate benchmark datasets to rigorously and reliably evaluate the progress. To this end, I present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale (up to 100+ million nodes and 1+ billion edges), encompass multiple important graph ML tasks, and cover a diverse range of domains. We show that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation.
Bio: Weihua Hu is a Ph.D. student of Computer Science at Stanford University, advised by Jure Leskovec. His research interests lie in graph representation learning and its applications to scientific discovery. His recent research is on advancing the field of Graph Neural Networks, by improving their theoretical understanding and generalization capability as well as building large-scale datasets for benchmarking models. He also actively applies his research to drug discovery and material discovery. He is supported by Funai Overseas Scholarship and Masason Foundation Fellowship. Before joining Stanford, Weihua received his Bachelor's and Master’s degrees both from the University of Tokyo, where he received the best Master’s thesis award.
Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
In this talk, I will discuss our recent ICML paper, Estimating Q(s,s') with Deep Deterministic Dynamics Gradients. In this paper, we introduce a novel form of value function, Q(s, s'), that expresses the utility of transitioning from a state s to a neighboring state s' and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by suboptimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.
Bio: Ashley Edwards is a research scientist focusing on developing general goal representations for deep reinforcement learning, imitation learning, and model-based reinforcement learning problems. She obtained her PhD in Computer Science from Georgia Tech under the supervision of Prof. Charles Isbell in 2019. She was formerly a research scientist at Uber AI Labs, and during her time as a PhD student at Georgia Tech, she was a recipient of the NSF Graduate Research Fellowship, was a visiting researcher at Waseda University in Japan, and interned at Google Brain. She received a B.S. in Computer Science from the University of Georgia in 2011.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: