KDD Cup 2019
KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.
Updates
- KDD Cup Special Day & Workshop Agenda
- Information for presenters at KDD Cup Day
- Call for KDD Cup Posters
- Special KDD Cup Issue. Call for Papers (coming soon)
- Call for sponsorship proposals (closed)
KDD CUP 2019 competition tracks:
In 3 competitions collectively, we had more than 2800 registered teams from over 39 countries and 230 academic and research institutions. Of those, there were 1200 actively participating teams, that is over 5000 individuals that made 17000 submissions overall. Total rewards exceeding 100K will be given to the winners and leaders of the competitions.
- Regular Machine Learning Competition Track (Regular ML Track) (April 10 - July 15, 2019 - closed)
- Automated Machine Learning Competition Track (Auto-ML Track) (April 1 - July 20, 2019 - closed)
- “Research for Humanity” Reinforcement Learning Competition Track (Humanity RL Track) (April 15 - July 20, 2019 - closed)
Regular Machine Learning Competition Track (Regular ML Track)
Context-aware multi-modal transportation recommendation has a goal of recommending a travel plan which considers various unimodal transportation modes, such as walking, cycling, driving, public transit, and how to connect among these modes under various contexts. The successful development of multi-modal transportation recommendations can have a number of advantages, including but not limited to reducing transport times, balancing traffic flows, reducing traffic congestion, and ultimately, promoting the development of intelligent transportation systems.
Despite the popularity and frequent usage of transportation recommendation on navigation Apps (e.g., Baidu Maps and Google Maps), existing transportation recommendation solutions only consider routes in one transportation mode. Intuitively, in the context-aware multi-modal transportation recommendation problem, the transport mode preferences vary over different users and spatiotemporal contexts. More >
Sponsor: Baidu.com
Total reward: $45,000
Automated Machine Learning Competition Track (Auto-ML Track)
Temporal relational data is very common in industrial machine learning applications, such as online advertising, recommender systems, financial market analysis, medical treatment, fraud detection, etc. With timestamps to indicate the timings of events and multiple related tables to provide different perspectives, such data contains useful information that can be exploited to improve machine learning performance. However, currently, the exploitation of temporal relational data is often carried out by experienced human experts with in-depth domain knowledge in a labor-intensive trial-and-error manner.
In this challenge, participants are invited to develop AutoML solutions to binary classification problems for temporal relational data. The provided datasets are in the form of multiple related tables, with timestamped instances. Five public datasets (without labels in the testing part) are provided to the participants so that they can develop their AutoML solutions. Afterward, solutions will be evaluated with five unseen datasets without human intervention. The results of these five datasets determine the final ranking. More >
Sponsor: 4Paradigm.com
Total reward: $33,500
“Research for Humanity” Reinforcement Learning Competition Track (Humanity RL Track)
Malaria is thought to have had the greatest disease burden throughout human history, while it continues to pose a significant but disproportionate global health burden. With 50% of the world’s population at risk of malaria infection. Sub Saharan Africa is most affected, with 90% of all cases.
Through this KDD Cup|Humanity RL track competition we are looking for participants to apply machine learning tools to determine novel solutions which could impact malaria policy in Sub Saharan Africa. Specifically, how should combinations of interventions which control the transmission, prevalence and health outcomes of malaria infection, be distributed in a simulated human population. More >
Sponsor: IBM Africa and Hexagon-ML.com
Total reward: $25,000
KDD Cup Innovation Award
In order to foster innovation in data science competitions and encourage the community, this year we have established a special KDD Cup Innovation Award. In 2019, KDD Cup Innovation Award is given to Hexagon-ML, our competition platform sponsor for the Reinforcement Learning competition.
Hexagon-ML obtains this award for:
- Pioneering a new type of competition, the Reinforcement Learning Competition, and implementing it under a novel computational environment for KDD Cup 2019.
- Contribution into solving a Malaria Problem for Humanity in successful collaboration with IBM Research Africa. In 2019, re-emergence of infectious diseases is among top 10 humanitarian crises in the world.
- Efforts in gathering, developing, and growing a data science community in Reinforcement Learning.
KDD Cup Day
Panel 1: “How should companies use competition platforms?”
Companies have used data science competition as a strategy to bring cultural change or even crowd source their problems to external teams. Netflix in our recent past was one example, where they pioneered this practice by crowdsourcing their recommendation algorithm. Further, data science competition companies, such as Kaggle, Hexagon-ML and others, host competitions either sponsored by companies on their platform or hosted in the companies itself. In this panel we will discuss how corporate companies should use data science competition platforms with some of the industry leaders.
Moderator: Taposh Dutta-Roy
Panelists:


Jason Jones is a Chief Data Scientist at Health Catalyst. He is passionate about making use of data in healthcare easier and helping organizations to find analytic focus. Jason formerly held two senior analytics leadership roles at Kaiser Permanente (KP), as well as analytical and marketing positions at Intermountain Healthcare, and Bayer Healthcare. Prior to that, Jason worked at Ingenix (now Optum), where he succeeded in converting United Healthcare data into a saleable asset for external customers conducting outcomes research. Throughout his academic career, Jason has taught graduate courses in statistics to medical informaticists at the University of Southern California and at the University of Utah. He has published dozens of peer-reviewed papers in medicine, predictive modeling, and outcomes improvement. Jason received his PhD in Biostatistics from the University of Southern California in 2001.
Jason Jones believes better use of data is critical to achieving the Quadruple Aim. For people, it means having access to options and guidance to achieve our goals in the context of our preferences and circumstances. For providers, it means feeling supported in applying expertise and caring to achieve and sustain health. For organizations, it means achieving better focus and resource application for sustainable improvement. Providing the right information, to the right person, and at the right time augments individual intelligence and team success.

Panel 2: “How will AutoML change the future of data science?”
AutoML, as a concept and as a product, gained traction several years ago, increasing in popularity and complexity ever since. Originally designed to automate certain steps that are beyond the abilities of non-experts, it makes data scientists more productive, inevitably shifting perspective, focus, and calling for different skills. During this panel we are hoping to collect opinions of people who invent, create, and use AutoML. In particular, we are interested to discuss non-trivial cases and applications of AutoML, current limitations, variety of existing products and how they are meeting new demands, arising applications, overall progress in the area over a few years, and debate on how data science job will change influenced by AutoML.
Moderator: Iryna Skrypnyk
Panelists:



Ganesh Thondikulam is an experienced IT Leader with a distinguished career architecting and leading the on-time and within budget design and delivery of enterprise-wide solutions to meet business, financial and market demands. Ganesh currently holds a position of Executive Director at Analytics Digital Foundation at Kaiser Permanente (KP), where he is responsible for establishing the vision, laying the foundation and setting technology direction for KP’s data and analytics capabilities across the enterprise. Prior to that, Ganesh was a Principal for Intelligence Systems, Enterprise & Health Plan Architecture at Kaiser Permanente (KP). He joined Kaiser Permanente from Blue Shield of California, where he was a Head of Enterprise Architecture. Ganesh has gained extensive experience working in IT Architect roles for several other companies in his earlier career.
Ganesh completed an executive education on Strategic Leadership of Technology and Innovation from Stanford University, MBA in e-Commerce and General Management from Opus College of Business at St. Thomas University, St. Paul, MN. He received a M.Sc. in Physics from the University of Texas at El Paso.

Blog posts
- https://medium.com/@taposhdr/reinforcement-learning-to-eradicate-malaria-with-ai-49e9e4016665
- https://www.kdd.org/kdd2019/News/view/a-word-from-kdd-cup-2019-organizers
Visa Support
the information for visa support letters can be found here.
KDD Cup Chairs
kddcup2019@kdd.org

Taposh Dutta-Roy leads an Innovation Team of Decision Support Group at Kaiser Permanente (KP). His work focuses on journey analytics, deep learning, data science architecture and strategy. Prior to KP, Taposh held a Head of AD products positions at start-up companies, Inpowered and Netshelter (Sold to Ziff Davis). Prior to start-ups he worked as Senior Associate Consultant at the MIT-based consulting company Sapient. He was the co-founder a biotech company Bio-Integrated solution developing DNA sequencers and liquid handling devices for proteomics. He has a M.Sc. in Electrical Engineering and Computer Engineering from Illinois Institute of Technology and MBA from UC Davis. During his Phd Program his research was on Cellular Monte-Carlo simulation of Device Physics.
His experience spans across diverse domains: Biotech Start-ups (www.biointsol.com), Entertainment start-ups (www.fandango.com), Financial firms - Countrywide Home Loans and Franklin Templeton Investments, Consumer Electronics (Emerson Electric and Dolby), Insurance (WCIRB), Consulting (SapientNitro), and most recently, with Social Media (www.inpwrd.com). He has a unique combination of product, technology, and strategy consulting, data science and start-up experience. He is a consumer focused, machine learning, and data science geek.

Wenjun Zhou is currently an Associate Professor at the Haslam College of Business, the University of Tennessee Knoxville, where she teaches data mining, text mining, and multivariate methods. Her general research interests include data mining, business analytics, and statistical computing. She has published prolifically in refereed conference proceedings and journals, such as KDD, ICDM, TKDE, and the Machine Learning journal. Wenjun was nominated the George and Margaret Melton Faculty Fellow in 2019, the Roy & Audrey Fancher Faculty Research Fellow in 2017, and the R. Stanley Bowden II Faculty Research Fellow in 2016. She is the recipient of the Best Paper Award at INFORMS Data Science 2018, Best Student Paper Award at AOM 2017, Best Paper Award at WAIM 2013, Best Student Paper Runner-Up Award at KDD 2008, Best Paper Runner-Up Award at ICTAI 2006. She was among the top five finalists for ACM SIGKDD Doctoral Dissertation Award in 2011. Wenjun has worked with a variety of companies on data analytics projects, including Panasonic, Yahoo!, IBM, CareerBuilder, Bush Brothers, Procter & Gamble, Capital One, and Coca-Cola. Wenjun serves regularly on NSF review panels and program committees of international conferences. She is a senior member of ACM and IEEE, and a member of INFORMS.

Iryna Skrypnyk is currently a Senior Manager at the Global Real-World Data and Analytics Division of Pfizer Digital, where she is leading AI and machine learning technological innovation focusing on clinical data science projects that span across several therapeutic areas. Iryna joined Pfizer from Qualia - a leader in cloud-based intent targeting and cross-screen audience association that is empowering marketers with the ability to target and respond to real-time expressions of consumer intent across many channels and devices. Iryna led the data science team and was responsible for all aspects of data and analytics, ranging from data acquisition and engineering to analytics, machine learning and natural language processing. Prior to Qualia, Iryna worked as a lead data scientist at a couple of data-driven start-ups in the NY area, where she was responsible for setting-up the data science teams and developing the analytic products portfolio. Before moving to industry, Iryna spent over 10 years in academic research. She also worked with IBM T.J. Watson Research Center and at Bell Labs, where her main areas of research focused on representation learning, semi-supervised learning, data complexity analysis, neural networks, and multi-classifier ensemble models. Iryna served as a program committee member at several IEEE conferences and Elsevier Pattern Recognition Magazine. Iryna holds a Ph.D. and Ph.Lic. in Computer Science and Information Systems and Mathematical Information Technology from the University of Jyväskylä, Finland, M.Sc. in Computer Science and Artificial Intelligence from Kharkiv National University of Radio Electronics, Ukraine.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: