Tutorial: Principles and Applications of Probabilistic Learning


Presenter:

Padhraic Smyth (University of California, Irvine)
http://www.ics.uci.edu/~smyth

Abstract:

This tutorial will provide an overview of general principles of learning probabilistic models from data, and illustrate these principles using a number of different real-world applications. The tutorial will focus on an approach known as generative modeling, whereby a probability model is learned that approximates how the data was thought to be generated. This general approach can be particularly useful for data analysis problems that can not easily be cast in more traditional multivariate predictive modeling frameworks. These include problems involving variable length data records, hierarchies, relational information, temporal and spatial dependencies, hidden variables, and so forth.

The tutorial will begin by focusing on a particular class of probabilistic models known as directed graphical models, emphasizing how such models can be used both as an elegant representational language and a general computational framework for reasoning about relatively complex sets of variables. Building on the language of graphical models, we will then discuss how parameters and structure can be learned from data, reviewing general principles such as maximum likelihood, Bayesian inference, and learning with hidden variables. Real-world data sets will be used to illustrate how these principles can be used to solve a broad variety of real-world problems involving complex data, including applications from areas such as information extraction, Web mining, bioinformatics, and climate science.

Intended Audience:

The intent of this tutorial is to provide a starting point for students and researchers interested in learning more about the underlying principles of techniques such as Bayesian learning and how such techniques can be applied in practice. While many of the ideas in the tutorial are founded on statistical concepts, no specific background in statistics will be assumed. However, an understanding of basic concepts in probability will be helpful. The tutorial will emphasize concepts rather than mathematical rigor where possible and will strive to provide a consistent view of probabilistic learning that unifies seemingly disparate approaches and methods.

Biography:

Padhraic Smyth is a Professor in the Bren School of Information and Computer Science at the University of California, Irvine. He is also a member of the Institute for Mathematical Behavioral Sciences, the Institute for Genomics and Bioinformatics, and the Department of Biomedical Engineering (all at UC Irvine). Dr. Smyth's research interests include machine learning, data mining, statistical pattern recognition, applied statistics, and information theory. He was a co-recipient of best paper awards at the 2002 and 1997 ACM SIGKDD Conferences, an IBM Faculty Partnership Award in 2001, a National Science Foundation Faculty CAREER award in 1997 and the Lew Allen Award for Excellence in Research at JPL in 1993. Dr. Smyth is co-author of Modeling the Internet and the Web: Probabilistic Methods and Algorithms (with Pierre Baldi and Paolo Frasconi), published by Wiley in 2003. He is also co-author of an introductory text on data mining, Principles of Data Mining, MIT Press, August 2001, with David Hand and Heikki Mannila, and he was co-editor of Advances in Knowledge Discovery and Data Mining, published by MIT Press in 1996. He recently served as associate editor for the Journal of the American Statistical Association and the IEEE Transactions on Knowledge and Data Engineering, has served as an action editor for the Machine Learning Journal, is a founding associate editor for the Journal of Data Mining and Knowledge Discovery, and a founding editorial board member of the Journal of Machine Learning Research. In addition to academic research he is an active consultant to industry on a variety of problems involving analysis of large data sets, including text documents, Web clickstreams, and various forms of multivariate and time-series data. Dr. Smyth received a first class honors degree in Electronic Engineering from University College Galway (National University of Ireland) in 1984, and the MSEE and PhD degrees from the Electrical Engineering Department at the California Institute of Technology in 1985 and 1988 respectively. From 1988 to 1996 he was a Technical Group Leader at the Jet Propulsion Laboratory, Pasadena, and has been on the faculty at UC Irvine since 1996.

Webmaster: Michal Sabala