KDD Cup 2019

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.


KDD CUP 2019 competition tracks:

In 3 competitions collectively, we had more than 2800 registered teams from over 39 countries and 230 academic and research institutions. Of those, there were 1200 actively participating teams, that is over 5000 individuals that made 17000 submissions overall. Total rewards exceeding 100K will be given to the winners and leaders of the competitions.

Regular Machine Learning Competition Track (Regular ML Track)

Context-aware multi-modal transportation recommendation has a goal of recommending a travel plan which considers various unimodal transportation modes, such as walking, cycling, driving, public transit, and how to connect among these modes under various contexts. The successful development of multi-modal transportation recommendations can have a number of advantages, including but not limited to reducing transport times, balancing traffic flows, reducing traffic congestion, and ultimately, promoting the development of intelligent transportation systems.

Despite the popularity and frequent usage of transportation recommendation on navigation Apps (e.g., Baidu Maps and Google Maps), existing transportation recommendation solutions only consider routes in one transportation mode. Intuitively, in the context-aware multi-modal transportation recommendation problem, the transport mode preferences vary over different users and spatiotemporal contexts. More >

Sponsor: Baidu.com

Total reward: $45,000

Winner Announcement

Automated Machine Learning Competition Track (Auto-ML Track)

Temporal relational data is very common in industrial machine learning applications, such as online advertising, recommender systems, financial market analysis, medical treatment, fraud detection, etc. With timestamps to indicate the timings of events and multiple related tables to provide different perspectives, such data contains useful information that can be exploited to improve machine learning performance. However, currently, the exploitation of temporal relational data is often carried out by experienced human experts with in-depth domain knowledge in a labor-intensive trial-and-error manner.

In this challenge, participants are invited to develop AutoML solutions to binary classification problems for temporal relational data. The provided datasets are in the form of multiple related tables, with timestamped instances. Five public datasets (without labels in the testing part) are provided to the participants so that they can develop their AutoML solutions. Afterward, solutions will be evaluated with five unseen datasets without human intervention. The results of these five datasets determine the final ranking. More >

Sponsor: 4Paradigm.com

Total reward: $33,500

Winner Announcement

“Research for Humanity” Reinforcement Learning Competition Track (Humanity RL Track)

Malaria is thought to have had the greatest disease burden throughout human history, while it continues to pose a significant but disproportionate global health burden. With 50% of the world’s population at risk of malaria infection. Sub Saharan Africa is most affected, with 90% of all cases.

Through this KDD Cup|Humanity RL track competition we are looking for participants to apply machine learning tools to determine novel solutions which could impact malaria policy in Sub Saharan Africa. Specifically, how should combinations of interventions which control the transmission, prevalence and health outcomes of malaria infection, be distributed in a simulated human population. More >

Sponsor: IBM Africa and Hexagon-ML.com

Total reward: $25,000

Winner Announcement

KDD Cup Innovation Award

In order to foster innovation in data science competitions and encourage the community, this year we have established a special KDD Cup Innovation Award. In 2019, KDD Cup Innovation Award is given to Hexagon-ML, our competition platform sponsor for the Reinforcement Learning competition.

Hexagon-ML obtains this award for:

  • Pioneering a new type of competition, the Reinforcement Learning Competition, and implementing it under a novel computational environment for KDD Cup 2019.
  • Contribution into solving a Malaria Problem for Humanity in successful collaboration with IBM Research Africa. In 2019, re-emergence of infectious diseases is among top 10 humanitarian crises in the world.
  • Efforts in gathering, developing, and growing a data science community in Reinforcement Learning.

KDD Cup Day

Panel 1: “How should companies use competition platforms?”

Companies have used data science competition as a strategy to bring cultural change or even crowd source their problems to external teams. Netflix in our recent past was one example, where they pioneered this practice by crowdsourcing their recommendation algorithm. Further, data science competition companies, such as Kaggle, Hexagon-ML and others, host competitions either sponsored by companies on their platform or hosted in the companies itself. In this panel we will discuss how corporate companies should use data science competition platforms with some of the industry leaders.

Moderator: Taposh Dutta-Roy


Claudia Perlich is a Senior Data Scientist at Two Sigma in New York City. Prior to her role at Two Sigma, she was the Chief Scientist at Dstillery where she designed, developed, analyzed, and optimized machine learning that drives digital advertising to prospective customers of brands. She started her career in Data Science at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She tends to be domain agnostic having worked on almost anything from Twitter, DNA, server logs, CRM data, web usage, breast cancer, movie ratings and many more. Perlich is an active public speaker and has published over 50 scientific publications as well as a few patents in the area of machine learning. She received her PhD in Information Systems from NYU Stern School of Business and holds a Master of Computer Science from Colorado University.

Jason Jones is a Chief Data Scientist at Health Catalyst. He is passionate about making use of data in healthcare easier and helping organizations to find analytic focus. Jason formerly held two senior analytics leadership roles at Kaiser Permanente (KP), as well as analytical and marketing positions at Intermountain Healthcare, and Bayer Healthcare. Prior to that, Jason worked at Ingenix (now Optum), where he succeeded in converting United Healthcare data into a saleable asset for external customers conducting outcomes research. Throughout his academic career, Jason has taught graduate courses in statistics to medical informaticists at the University of Southern California and at the University of Utah. He has published dozens of peer-reviewed papers in medicine, predictive modeling, and outcomes improvement. Jason received his PhD in Biostatistics from the University of Southern California in 2001.
Jason Jones believes better use of data is critical to achieving the Quadruple Aim. For people, it means having access to options and guidance to achieve our goals in the context of our preferences and circumstances. For providers, it means feeling supported in applying expertise and caring to achieve and sustain health. For organizations, it means achieving better focus and resource application for sustainable improvement. Providing the right information, to the right person, and at the right time augments individual intelligence and team success.

Lin Wang currently works at Vesta Corporation, where he holds a position of Senior Data Scientist. Vesta Corporation is a provider of secure electronic payment, fraud management, data, and security compliance solutions in Specialty and Consumer Finance, offering a variety of payment channels including internet, phone, retail point of sale, and mobile commerce applications. Lin focuses on risk management that encompasses detection of customer behavior patterns and application of machine learning models to tackle challenging tasks at various parts of the business, including fraud detection and anomaly detection. In particular, Lin have been working on the end-to-end machine learning platform, unsupervised anomaly transaction alert system, and real-time fraud detection system, which helped the company boost their confidence in approving payment transactions and reduce the financial loss. Lin is one of the hosts and a committee member of a Kaggle competition “IEEE Fraud Detection”. Prior to Vesta Corp, Lin worked as a research scientist in Clemson University and applied pattern recognition and machine learning methods to drug discovery and protein-protein interaction. Lin holds a Ph.D. in Computational Physics.

Panel 2: “How will AutoML change the future of data science?”

AutoML, as a concept and as a product, gained traction several years ago, increasing in popularity and complexity ever since. Originally designed to automate certain steps that are beyond the abilities of non-experts, it makes data scientists more productive, inevitably shifting perspective, focus, and calling for different skills. During this panel we are hoping to collect opinions of people who invent, create, and use AutoML. In particular, we are interested to discuss non-trivial cases and applications of AutoML, current limitations, variety of existing products and how they are meeting new demands, arising applications, overall progress in the area over a few years, and debate on how data science job will change influenced by AutoML.

Moderator: Iryna Skrypnyk


Ashwin Aravindakshan is an Associate Professor at the Graduate School of Management at UC Davis, and Director of the MS in Business Analytics program at the school. He studies user and market behavior across multiple business domains including retail, mobile environments, CRM, advertising and communications. As part of his research in these domains, he has developed new algorithms in the areas of spatiotemporal modeling, advertising dynamics, customer experience management and optimal resource allocation models. As director of the analytics program, he oversees the academic curriculum and also works with firms to design real-world experiences that students obtain through the program via practicums. He received a BTech in Aerospace Engineering from IIT-Madras and a PhD in Business at the University of Maryland, College Park.

Dmitry Larko is a competitive data scientist who loves to spend his time on Kaggle, trying new things and enjoying competitions. Dmitry is currently a senior data scientist at H2O.ai, where he is one of the key contributors on Driverless AI, a machine learning automation software. He did his Masters in Computer Systems from Siberian Federal University in Russia.

Ganesh Thondikulam is an experienced IT Leader with a distinguished career architecting and leading the on-time and within budget design and delivery of enterprise-wide solutions to meet business, financial and market demands. Ganesh currently holds a position of Executive Director at Analytics Digital Foundation at Kaiser Permanente (KP), where he is responsible for establishing the vision, laying the foundation and setting technology direction for KP’s data and analytics capabilities across the enterprise. Prior to that, Ganesh was a Principal for Intelligence Systems, Enterprise & Health Plan Architecture at Kaiser Permanente (KP). He joined Kaiser Permanente from Blue Shield of California, where he was a Head of Enterprise Architecture. Ganesh has gained extensive experience working in IT Architect roles for several other companies in his earlier career.
Ganesh completed an executive education on Strategic Leadership of Technology and Innovation from Stanford University, MBA in e-Commerce and General Management from Opus College of Business at St. Thomas University, St. Paul, MN. He received a M.Sc. in Physics from the University of Texas at El Paso.

Wei-Wei Tu is principle machine learning architect at the 4Paradigm Inc., Beijing, China. He previously worked as a senior engineer for two and a half years at China’s biggest search engine, Baidu. At Baidu, he built Baidu’s first distributed GBDT training system running on hundreds of machines, deployed Baidu’s first large scale deep learning-based click-through rate prediction system, and co-designed Baidu’s first distributed machine learning computation framework ELF. One of his work won “Baidu Million Dollar Highest Prize”. At 4Paradigm Inc., Wei-Wei Tu designed and developed the distributed machine learning computation framework, and led his team developed many large-scale distributed machine learning algorithms supporting thousands of billions parameters and hundreds of billions instances. He is now leading his team to build business AutoML systems at 4Paradigm Inc. He has served as co-editor of special issues in IEEE TPAMI. He is data competition chair of PAKDD2018 and PAKDD 2019, chair of AutoML workshop at PRICAI2018, one of the main organizers of NeurIPS 2018 AutoML Challenge, one of the main organizers of NeurIPS 2019 AutoDL competition.

Blog posts

Visa Support

the information for visa support letters can be found here.

KDD Cup Chairs

Taposh Dutta-Roy leads an Innovation Team of Decision Support Group at Kaiser Permanente (KP). His work focuses on journey analytics, deep learning, data science architecture and strategy. Prior to KP, Taposh held a Head of AD products positions at start-up companies, Inpowered and Netshelter (Sold to Ziff Davis). Prior to start-ups he worked as Senior Associate Consultant at the MIT-based consulting company Sapient. He was the co-founder a biotech company Bio-Integrated solution developing DNA sequencers and liquid handling devices for proteomics. He has a M.Sc. in Electrical Engineering and Computer Engineering from Illinois Institute of Technology and MBA from UC Davis. During his Phd Program his research was on Cellular Monte-Carlo simulation of Device Physics.

His experience spans across diverse domains: Biotech Start-ups (www.biointsol.com), Entertainment start-ups (www.fandango.com), Financial firms - Countrywide Home Loans and Franklin Templeton Investments, Consumer Electronics (Emerson Electric and Dolby), Insurance (WCIRB), Consulting (SapientNitro), and most recently, with Social Media (www.inpwrd.com). He has a unique combination of product, technology, and strategy consulting, data science and start-up experience. He is a consumer focused, machine learning, and data science geek.

Wenjun Zhou is currently an Associate Professor at the Haslam College of Business, the University of Tennessee Knoxville, where she teaches data mining, text mining, and multivariate methods. Her general research interests include data mining, business analytics, and statistical computing. She has published prolifically in refereed conference proceedings and journals, such as KDD, ICDM, TKDE, and the Machine Learning journal. Wenjun was nominated the George and Margaret Melton Faculty Fellow in 2019, the Roy & Audrey Fancher Faculty Research Fellow in 2017, and the R. Stanley Bowden II Faculty Research Fellow in 2016. She is the recipient of the Best Paper Award at INFORMS Data Science 2018, Best Student Paper Award at AOM 2017, Best Paper Award at WAIM 2013, Best Student Paper Runner-Up Award at KDD 2008, Best Paper Runner-Up Award at ICTAI 2006. She was among the top five finalists for ACM SIGKDD Doctoral Dissertation Award in 2011. Wenjun has worked with a variety of companies on data analytics projects, including Panasonic, Yahoo!, IBM, CareerBuilder, Bush Brothers, Procter & Gamble, Capital One, and Coca-Cola. Wenjun serves regularly on NSF review panels and program committees of international conferences. She is a senior member of ACM and IEEE, and a member of INFORMS.

Iryna Skrypnyk is currently a Senior Manager at the Global Real-World Data and Analytics Division of Pfizer Digital, where she is leading AI and machine learning technological innovation focusing on clinical data science projects that span across several therapeutic areas. Iryna joined Pfizer from Qualia - a leader in cloud-based intent targeting and cross-screen audience association that is empowering marketers with the ability to target and respond to real-time expressions of consumer intent across many channels and devices. Iryna led the data science team and was responsible for all aspects of data and analytics, ranging from data acquisition and engineering to analytics, machine learning and natural language processing. Prior to Qualia, Iryna worked as a lead data scientist at a couple of data-driven start-ups in the NY area, where she was responsible for setting-up the data science teams and developing the analytic products portfolio. Before moving to industry, Iryna spent over 10 years in academic research. She also worked with IBM T.J. Watson Research Center and at Bell Labs, where her main areas of research focused on representation learning, semi-supervised learning, data complexity analysis, neural networks, and multi-classifier ensemble models. Iryna served as a program committee member at several IEEE conferences and Elsevier Pattern Recognition Magazine. Iryna holds a Ph.D. and Ph.Lic. in Computer Science and Information Systems and Mathematical Information Technology from the University of Jyväskylä, Finland, M.Sc. in Computer Science and Artificial Intelligence from Kharkiv National University of Radio Electronics, Ukraine.

How can we assist you?

We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!

Please enter the word you see in the image below: