Invited Keynote Speakers


Convex Optimization: From Embedded Real-Time To Large-Scale Distributed

Convex optimization has emerged as useful tool for applications that include data analysis and model fitting, resource allocation, engineering design, network design and optimization, finance, and control and signal processing. After an overview, the talk will focus on two extremes: real-time embedded convex optimization, and distributed convex optimization. Code generation can be used to generate extremely efficient and reliable solvers for small problems, that can execute in milliseconds or microseconds, and are ideal for embedding in real-time systems. At the other extreme, we describe methods for large-scale distributed optimization, which coordinate many solvers to solve enormous problems.

Stephen P. Boyd is the Samsung Professor of Engineering, and Professor of Electrical Engineering in the Information Systems Laboratory at Stanford University. He also has a courtesy appointment in the Department of Management Science and Engineering, and is member of the Institute for Computational and Mathematical Engineering. His current research focus is on convex optimization applications in control, signal processing, and circuit design.

Professor Boyd received an AB degree in Mathematics, summa cum laude, from Harvard University in 1980, and a PhD in EECS from U. C. Berkeley in 1985. He holds an honorary doctorate from Royal Institute of Technology (KTH), Stockholm. In 1985 he joined the faculty of Stanford's Electrical Engineering Department.

He is the author of many papers and three books: Linear Controller Design: Limits of Performance (with Craig Barratt, 1991), Linear Matrix Inequalities in System and Control Theory (with L. El Ghaoui, E. Feron, and V. Balakrishnan, 1994), and Convex Optimization (with Lieven Vandenberghe, 2004). His group has produced several open source tools, including CVX (with Michael Grant), a widely used parser-solver for convex optimization.


TITLE: Internet Scale Data Analysis

This talk covers techniques for analyzing data sets with up to trillions of examples with billions of features, using thousands of computers. To operate at this scale requires an understanding of an increasing complex hardware hierarchy (e.g. cache, memory, SSD, another machine in the rack, disk, a machine in another data center, ...); a model for recovering from inevitable hardware and software failures; a machine learning model that allows for efficient computation over large, continuously updated data sets; and a way to visualize and share the results.

Peter Norvig is a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. At Google Inc he was Director of Search Quality, responsible for the core web search algorithms from 2002-2005, and has been Director of Research from 2005 on.

Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA's senior computer scientist. He received the NASA Exceptional Achievement Award in 2001. He has served as an assistant professor at the University of Southern California and a research faculty member at the University of California at Berkeley Computer Science Department, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world's longest palindromic sentence.


TITLE: Cancer Genomics

Throughout life, the cells in every individual accumulate many changes in the DNA inherited from his or her parents. Certain combinations of changes lead to cancer. During the last decade, the cost of DNA sequencing has been dropping by a factor of 10 every two years, making it now possible to read most of the three billion base genome from a patient's cancer tumor, and to try to determine all of the thousands of DNA changes in it. Under the auspices of NCI's Cancer Genome Atlas Project, 10,000 tumors will be sequenced in this manner in the next two years. Soon cancer genome sequencing will be a widespread clinical practice, and millions of tumors will be sequenced. A massive computational problem looms in interpreting these data.

First, because we can only read short pieces of DNA, we have the enormous problem of assembling a coherent and reliable representation of the tumor genome from massive amounts of incomplete and error-prone evidence. This is the first challenge. Second, every human genome is unique from birth, and every tumor a unique variant. There is no single route to cancer. We must learn to read the varied signatures of cancer within the tumor genome and associate these with optimal treatments. Already there are hundreds of molecularly targeted treatments for cancer available, each known to be more or less effective depending on specific genetic variants. However, targeting a single gene with one treatment rarely works. The second challenge is to tackle the combinatorics of personalized, targeted, combination therapy in cancer.

David Haussler's research lies at the interface of mathematics, computer science, and molecular biology. He develops new statistical and algorithmic methods to explore the molecular function and evolution of the human genome, integrating cross-species comparative and high-throughput genomics data to study gene structure, function, and regulation. He is credited with pioneering the use of Hidden Markov Models (HMMs), Stochastic Context-Free Grammars, and discriminative kernel method for analyzing DNA, RNA, and protein sequences. He was the first to apply the latter methods to the genome-wide search for gene expression biomarkers in cancer, now a major effort of his laboratory.

As a collaborator on the international Human Genome Project, his team posted the first publicly available computational assembly of the human genome sequence on the internet on July 7, 2000. Following this his team developed the UCSC Genome Browser, a web-based tool that is used extensively in biomedical research and serves as the platform for several large-scale genomics projects, including NHGRI's ENCODE project to use omics methods to explore the function of every base in the human genome (for which UCSC serves as the Data Coordination Center), NIH's Mammalian Gene Collection, NHGRI's 1000 genomes project to explore human genetic variation, and NCI's Cancer Genome Atlas project to explore the genomic changes in cancer.

His group's informatics work on cancer genomics provides a complete analysis pipeline from raw DNA reads through the detection and interpretation of mutations and altered gene expression in tumor samples. His group collaborates with researchers at medical centers nationally, including members of the Stand Up To Cancer "Dream Teams" and the Cancer Genome Atlas, to discover molecular causes of cancer and pioneer a new personalized, genomics-based approach to cancer treatment.

He co-founded the Genome 10K Project to assemble a genomic zoo - a collection of DNA sequences representing the genomes of 10,000 vertebrate species - to capture genetic diversity as a resource for the life sciences and for worldwide conservation efforts.

Haussler received his Ph.D. in computer science from the University of Colorado at Boulder. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences and a fellow of AAAS and AAAI. He has won a number of awards, including the 2009 ASHG Curt Stern Award in Human Genetics, the 2008 Senior Scientist Accomplishment Award from the International Society for Computational Biology, the 2006 Dickson Prize for Science from Carnegie Mellon University, and the 2003 ACM/AAAI Allen Newell Award in Artificial Intelligence.

Investigator, Howard Hughes Medical Institute Distinguished Professor, Biomolecular Engineering, University of California, Santa Cruz Scientific Co-Director, California Institute for Quantitative Biosciences (QB3) Director, Center for Biomolecular Science & Engineering.



I will review concepts, principles, and mathematical tools that were found useful in applications involving causal and counterfactual relationships. This semantical framework, enriched with a few ideas from logic and graph theory, gives rise to a complete, coherent, and friendly calculus of causation that unifies the graphical and counterfactual approaches to causation and resolves many long-standing problems in several of the sciences. These include questions of causal effect estimation, policy analysis, and the integration of data from diverse studies. Of special interest to KDD researchers would be the following topics:

  1. The Mediation Formula, and what it tells us about direct and indirect effects.
  2. What mathematics can tell us about "external validity" or "generalizing from experiments"
  3. What can graph theory tell us about recovering from sample-selection bias.
Judea Pearl is a professor of computer science and statistics at the University of California, Los Angeles. He is a graduate of the Technion, Israel, and has joined the faculty of UCLA in 1970, where he currently directs the Cognitive Systems Laboratory and conducts research in artificial intelligence, causal inference and philosophy of science. He has authored three books: Heuristics (1984), Probabilistic Reasoning (1988), and Causality (2000;2009). A member of the National Academy of Engineering, and a Founding Fellow the American Association for Artificial Intelligence (AAAI), Judea Pearl is the recipient of the 2008 Benjamin Franklin Medal for Computer and Cognitive Science and this year's David Rumelhart Prize from the Cognitive Science Society.