Abstract

1. Industrialization of data science has changed the demographics of our profession and the attendance profile at our conferences. Unlike the 1990s and early 2000s, the majority of our attendees now self-identify as practitioners, not researchers. However the new attendees find it difficult to extract the full extent of value from our content & network.

2. We feel the goal of KDD is to maximize the global impact of data science technologies & professionals over the next 10+ years. At a more personal level our goal is to help data scientist attendees succeed in their career, either in research or in practice. Instead of only maximizing attendance at the conference, we aim to increase our value to the community, including the much larger group of data scientists who do not attend the conference in person.

3. We describe a series of planned innovations at KDD 2016 that aim to make our content and network more actionable. We hypothesize that the innovations will a) expand the impact of the content presented via the conference; and b) help our online and offline audience form natural and effective sub-communities based on shared interests.

4. We view the innovations this year as risky experiments. Based on qualitative/quantitative measurements and feedback collected from our audience we plan to iteratively change our conference over the next several years to better serve the needs of our community.

Background trends and their impact on KDD: a) Industrialization of data science, and b) dramatic research advances

Over the last 3 decades, data science has steadily matured as a profession. In the 1990s projects often required researchers with PhDs in machine learning or statistics to invent new mathematical algorithms, to develop custom code from first principles, in ways that were not very re-usable across projects. By contrast we are now seeing the emergence of the industrial assembly line processes characterized by the division of labor, integrated pipelines of work, standards, automation, and repeatability.

The industrialization of Data Science over the last 3 decades has changed the demographics and needs of our profession. Unlike the 1990s and early 2000s, the majority of the attendees at all the major data science conferences — NIPS, ICML, KDD, etc — no longer self-identify as researchers. The majority of participants are now engineers or practitioners who consume but do not publish novel research. A large part of our community no longer has a PhD.

However, at the same time we are also witnessing astonishing progress from research in algorithms and systems. For example the field of deep neural networks has revolutionized speech recognition, NLP, computer vision, image recognition, etc. As a second example, recent distributed systems & statistics research describes how to preserve privacy/confidentiality while simultaneously allowing the mining of globally distributed data to achieve societal benefit, and how to achieve this in environments with constraints on communication bandwidth, power consumption, computational limits on each node etc (for example in Internet of Things applications). This has fueled a dramatic surge in investments to commercialize these rapid advances from VCs, startups and large companies like Facebook, Amazon, Microsoft and Google.

The role of KDD in the ecosystem of data science conferences

We respect and value our sister conferences in the field of data science including ICML, NIPS, ICLR, AISTATS, Strata, Hadoop World, Predictive Analytics World, etc. Many of our organizing committee members regularly publish at these venues, and we find them incredibly useful to attend.

However, KDD has a unique role in this ecosystem of conferences. The purpose of the KDD conference is to enable data scientists to be more successful in their careers by helping them bridge theory and practice.

Thus, uniquely among the leading conferences, we have equally prestigious parallel tracks that emphasize algorithmic research and lessons learned from large scale deployments in practice. By facilitating interaction between practitioners at large companies & startups on the one hand, and the algorithm development researchers including leading academics on the other, KDD’16 fosters technological and entrepreneurial innovation in the area of data science.

To make sure that as a community we do not lose sight of the forest for the trees, we invite a large group of founders, CXOs, VPs and other senior executives from leading data-centric companies help our audience understand large underlying trends, and their business/ technological impact. We feature hands on tutorials and workshops that help practitioners stay abreast with recent theoretical developments and their implementation in open source tools that they can immediately use in production.

Finally, we also facilitate networking with venture capitalists, enable companies to hire leading engineers and researchers, and facilitate the formation and growth of online and offline professional networks.

Why should you attend KDD? What is the goal of the conference?

Our audience attends KDD in order to: a) learn state-of-the-art content; or to b) form or grow a professional network.

We help our audience identify technology trends early, make new/creative contributions at work either as researchers or as practitioners, increase productivity by using newer/better tools or techniques, identify new job opportunities for themselves, and hire new team members.

KDD also enables practitioners to enrich their professional networks, to obtain professional mentoring, or to connect to colleagues to understand which algorithms or data proved most relevant for their problem. It has also introduced “office hours” for budding entrepreneurs to talk with venture capitalists.

Leading researchers are a very important part of our audience. By direct exposure to large real world deployments and the practical issues encountered therein, researchers tap into a rich vein of problems that they can address in the near future. This also helps them get jobs, find industry collaborators, and obtain funding.

The goal of the KDD conference is to maximize the global impact of data scientists and data science technologies across industries over an extended period of 10+ years.

Innovations at KDD 2016

We will introduce three types of changes at KDD 16 designed to promote this objective.

First, we plan to make the majority of the content freely available online in perpetuity regardless of whether the user pays to attend the conference in person.

Second we help those who attend the conference in person to grow and extract more value from their professional network formed via KDD.

Third, we plan to introduce four innovations (described below) designed to make the content and the network more usable/actionable to a much larger audience of practitioners.

1. Hands on Software Tutorials: In parallel with the main conference we plan to introduce two tracks of hands-on coding tutorials that aim to help practitioners starting from scratch quickly become capable of building practical solutions at work. Some tutorials will focus on software tools & infrastructure for data scientists such as Spark, AWS, Azure. Others will cover popular topics needed by practitioners such as recommender systems, text mining, deep learning etc. The goal of this session is to help our practitioner attendees achieve greater success in their job.

2. Make papers more accessible: While not universally appreciated, every paper is an advertisement. It is in the interest of authors to convince a large audience of researchers and practitioners to use or build upon her novel contributions or algorithms. Publication of a research or applied practice paper helps authors sell the new technique to a global data science community in order to increase its impact. Common measures of research productivity such as count of publications, or number of citations are objective but approximate ways to measure impact. Besides increasing conventional measures such as citations, we would like to also promote increased use of software tools implementing the methods proposed @ KDD by practitioners.

Typically KDD papers contain advanced technical material intended for an audience of researchers who are already familiar with the state of the art. To make KDD papers more accessible to a larger audience of practitioners who typically find the content inaccessible, KDD16 will provide our authors the option to upload their content in multiple new formats. This includes: a) a 2 minute summary video presentation that explains why the user should give your paper their attention, b) slides, c) data/code, d) blog post, e) full video talk uploaded by authors before the conference on YouTube, and f) detailed technical reports.

3. Curated Content & Online Discussions: KDD accepts a large number of high quality papers. The abundance of content makes it difficult for new practitioners to identify the two or three most important developments pertinent to their topics of interest.

To address this we have labeled the accepted (camera ready) papers into a small set of topics. Each topic will have an online page that allows authors to upload relevant content (see above). Further, may include a blog post and a discussion forum seeded by experts in the topic describing available tutorials, online courses, and the most important developments in the last 12 months, at KDD and possibly in related conferences. Online discussions will allow users worldwide to vote on content. We hope that this helps new practitioners identify the content to prioritize for initial study to get started quickly.

4. Online & Onsite Networking: We hope to facilitate the formation & growth of sub-communities & professional networks based on industry, shared interests, geography etc. A moderated online discussion forum on Disqus, mirrored on Facebook would allow users worldwide to communicate with authors, PC members or anyone else interested in the topic. Identities of members discussing at these fora would be optionally linked to their LinkedIn or Facebook accounts to facilitate the creation or growth of professional networks that may continue to be useful later. These online fora will have continuity across future KDD conferences. Each topic will also have the option to follow up with in-person networking sessions at the conference to allow attendees to meet others who share similar interests.

Measurements & feedback for continuous improvement

While rooted in surveys and semi-structured interviews, we view the innovations proposed above as essential but controversial and risky experiments. We hope to learn from the experiments and improve in future years by collecting feedback systematically during and after the conference. We will collect data using three approaches.

First, we plan to establish an electronic voting mechanism during the conference from our web site (and possibly the conference mobile app) to measure how much the audience liked each session/paper in terms of actionability, innovativeness, depth of technical content, accessibility etc.

Second, we will also monitor amount of content posted online and the volume of discussion under each topic, as well as asking users how many people they met specifically because of the facilitated online/onsite networking session.

Third we will send a user survey after the conference to measure overall satisfaction and to collect unstructured comments about what worked and what did not.

Conclusion and future outlook

As data science matures as a profession, we are witnessing the emerging need for a non-profit professional body similar to the American Medical Association that provides a range of services such as setting standards, professional certification, continuing education, advising legislative bodies (eg on privacy laws), managing press relations etc. We view the evolution of the KDD conference described here as a part of a larger, organic effort to enable our profession to achieve greater impact on society, and to help our members to succeed in their careers.

[1] This document is based on several hundred semi-structured interviews and 3 years of structured survey responses at recent KDD conferences to understand the emerging needs of attendees.

— Balaji Krishnapuram, Mohak Shah, Shipeng Yu

General Chairs, ACM SIGKDD 2016