Applied Data Science Invited Speakers

The Applied Data Science Invited Talks will provide a venue for leading experts in the world of applied data mining and knowledge discovery. These invited talks will feature highly influential speakers who have directly contributed to successful data mining applications in their respective fields. The talks and discussions will focus on innovative and leading-edge, large-scale industry or government applications of data mining in areas such as finance, health-care, bio-informatics, public policy, infrastructure, telecommunications, social media and computational advertising.

Keynote: Jan Schellenberger

Jan Schellenberger


Using Machine Learning to Detect Cancer Early

"GRAIL's mission is to detect cancer early, when it can be cured. Building a classifier that can detect cancer early in a clinical setting is a complicated endeavor with unique challenges: data acquisition and stabilization can take years; cancer status (labels) can be ambiguous, noisy, and changing; sequencing data can be enormous and presents scaling issues. Because the cancer early detection machine learning classifier is being built in the context of a clinical trial environment, an extra level of rigor and planning is required. Through the Circulating Cell-free Genome Atlas study, GRAIL has collected blood samples from >15,000 patients with and without cancer on which to train and validate the classifier. Results on a validation set show that the GRAIL classifier can detect >50 cancers with >99% specificity. The correct tissue of origin can be identified approximately 90% of the time."

Jan Schellenberger leads the classification infrastructure team at GRAIL. This team builds a platform to perform robust machine learning at scale. Prior to GRAIL, he worked at LinkedIn in the advertising optimization team and at ID:Analytics detecting identity manipulation and fraud. Jan has a PhD in Bioinformatics and Systems Biology from the University of California, San Diego studying genome scale metabolic models using Monte Carlo sampling.