- Program by time
- Program by session
- Build your own schedule and load it into Outlook or mobile device
Qi Lu, President of Online Services Division, Microsoft
Abstract The online services industry is a rapidly growing industry with a worldwide online ad market projected to grow from $48 billion in 2011 to $67
billion in 2013, of which 47% will come from display
advertising and 53% from search advertising. Online Services Division (OSD) within Microsoft is a leader in the consumer cloud space today
with a strong portfolio of a set of 3 mutually reinforcing businesses: Search, Portal, Advertising. They are supported by a shared foundational
asset of Intent & Knowledge Stores and a shared technology platform supporting large scale data and high performance systems. MSN (Portal)
and Bing (Search) generate the content, traffic and data, that make for an exciting fertile environment for large scale data mining practice
and system development. Our advertisers are thus given more valuable targeting opportunities and better ROI, which in turn, provide better
economics, usability data, and allows for a higher quality services for our advertisers and experience for our users. The ability to transform
data into meaningful, actionable insight is an important source of competitive advantage for OSD. The data mining initiatives within the
division continue to strive for excellence around the following goals: actionable insights through deep data analysis, data mining and data
modeling at scale and with speed, increased productivity from deployed large scale data systems and tools, improved product and service
development and decision making gained from effective measurement and experimentation, and a mature data culture in product teams that made
the above possible. With many technical and data challenges ahead of us, we are committed to utilizing our huge data asset well to understand the
need, intent, and behavior of our users for the purpose of serving them better.
Bio As president of Microsoft's Online Services Division (OSD), Dr. Qi Lu leads the company's search and online advertising efforts. Dr. Lu oversees the OSD Research & Development team which has responsibility for the evolution of Microsoft's search, portal and advertising services; the Online Audience Business Group; and the Advertiser and Publisher Solutions Business Group. Dr. Lu reports to Microsoft chief executive officer Steve Ballmer. Prior to joining Microsoft, Dr. Lu spent 10 years as a Yahoo! senior executive. His roles included serving as the executive vice president of engineering for the company's Search and Advertising Technology Group where he oversaw the development of Yahoo!'s Web search and monetization platforms and vice president of engineering responsible for the technology development of Yahoo!'s search, e-commerce and local listings of businesses and products. Before joining Yahoo!, Dr. Lu worked as a research staff member at IBM's Almaden Research Center and Carnegie Mellon University and was a faculty member at Fudan University in China. He received his bachelor of science and master of science in computer science from Fudan University and his Ph.D. in computer science from Carnegie Mellon University. Dr. Lu holds 20 U.S. patents.

David Jensen, Department of Computer Science, University of Massachusetts Amherst
Abstract Research and applications in knowledge discovery and data mining increasingly address some of the most fundamental questions of social science: What determines the structure and behavior of social networks? What influences consumer and voter preferences? How does participation in social systems affect behaviors such as fraud, technology adoption, or resource allocation? Often for the first time, these questions are being examined by analyzing massive data sets that record the behavior and interactions of individuals in physical and virtual worlds.
A new kind of scientific endeavor - computational social science - is emerging at the intersection of social science and computer science. The field draws from a rich base of existing theory from psychology, sociology, economics, and other social sciences, as well as from the formal languages and algorithms of computer science. The result is an unprecedented opportunity to revolutionize the social sciences, expand the reach and impact of computer science, and enable decision-makers to understand the complex systems and social interactions that we must manage in order to address fundamental challenges of economic welfare, energy production, sustainability, health care, education, and crime.
Computational social science suggests an impressive array of new tasks and technical challenges to researchers and practitioners of KDD. These include modeling complex systems with temporal, spatial, and relational dependence; identifying cause and effect rather than mere association; modeling systems with feedback; and conducting analyses in ways that protect the privacy of individuals. Many of these challenges interact in fundamental ways that are both surprising and encouraging. Together, they point to an exciting new future for knowledge discovery and data mining.
Bio David Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. His current research focuses on causal discovery in relational data, computational social network analysis, fraud detection, and privacy. He serves on the Executive Committee of the ACM Special Interest Group on Knowledge Discovery and Data Mining and on the program committees of the International Conference on Machine Learning and the International Conference on Knowledge Discovery and Data Mining. He is an associate editor of the ACM Transactions on Knowledge Discovery from Data. He serves on DARPA's Information Science and Technology (ISAT) Group. He recently served on a National Research Council panel assessing the research program of the National Institutes of Justice. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in St. Louis in 1992.

Konrad Feldman, CEO of Quantcast
Abstract As electronic communication, media and commerce increasingly permeate every aspect of modern life, real-time personalization of consumer experience through data-mining becomes practical. Effective classification, prediction and change modeling of consumer interests, behaviors and purchasing habits using machine learning and statistical methods drives efficiency, insights and consumer relevance that were never before possible. The internet has brought on a rapid evolution in advertising. Everything about behavior on the internet can be quantified and responses to behavior can occur in real time. This dynamic interaction with the user has created opportunities to better understand the way in which individuals move from awareness of a product to considering a purchase, through to intent and ultimately a sale for the marketer. When a marketer can answer the question „did those TV ads cause consumers to switch shampoo brands?‟ they can model behavior change and adjust marketing strategies accordingly. Underpinning this shift in how the world‟s trillion dollar marketing budget is spent is transactional data on an unprecedented scale, creating new challenges for software that must interpret this stream and make real time decisions tens, even hundreds of thousands of times every second. I will explore advances in modeling media consumption, advertising response and the real-time evaluation of media opportunities through reference to Quantcast, a business launched in September 2006 which today interprets in excess of 10 billion new digital media consumption records every day. We will examine the challenges of applying machine learning to non-search advertising and in doing so explore the creation of business environments – organization, infrastructure, tools, processes (and costs considerations) – in which scientists can quickly develop new petabyte scale algorithmic approaches, migrate them rapidly to real-time production and deliver fully customized experiences for marketers, publishers and consumers alike.
Bio Konrad Feldman, CEO, co-founded and launched Quantcast in 2006 along with Paul Sutter to transform the effectiveness of online advertising through the use of science and scalable computing. Prior to co-founding Quantcast, Feldman co-founded Searchspace (now Fortent) the leading provider of terrorist financing detection and anti-money laundering software for the world's financial services industry. As CEO of Searchspace's North American business, he established the business in the US and directed its rapid growth to become a market leader. Prior to Searchspace, Feldman was a Research Fellow in the Intelligent Systems Laboratory at University College London. Feldman holds a Bachelor of Science in Computer Science from University College, London.
Ashok Srivastava, Intelligent Data Understanding group, NASA Ames Research Center
Abstract Modern aircraft are producing data at an unprecedented rate with hundreds of parameters being recorded on a second by second basis. The data can be used for studying the condition of the hardware systems of the aircraft and also for studying the complex interactions between the pilot and the aircraft. NASA is developing novel data mining algorithms to detect precursors to aviation safety incidents from these data sources. This talk will cover the theoretical aspects of the algorithms and practical aspects of implementing these techniques to study one of the most complex dynamical systems in the world: the national airspace.
Bio Ashok N. Srivastava, Ph.D. is the Principal Investigator for the Integrated Vehicle Health Management research project at NASA. His current research focuses on the development of data mining algorithms for anomaly detection in massive data streams, kernel methods in machine learning, and text mining algorithms.
Dr. Srivastava is also the leader of the Intelligent Data Understanding group at NASA Ames Research Center. The group performs research and development of advanced machine learning and data mining algorithms in support of NASA missions. He performs data mining research in a number of areas in aviation safety and application domains such as earth sciences to study global climate processes and astrophysics to help characterize the large-scale structure of the universe.
Dr. Srivastava is the author of many research articles in data mining, machine learning, and text mining, and has edited a book on Text Mining: Classification, Clustering, and Applications(with Mehran Sahami, 2009). He is currently editing two more books: Advances in Machine Learning and Data Mining for Astronomy (with Kamal Ali, Michael Way, and Jeff Scargle) andData Mining in Systems Health Management (with Jiawei Han).
He has won numerous awards including the IEEE Computer Society Technical Achievement Award for "pioneering work in Intelligent Information Systems," the NASA Exceptional Achievement Medal for contributions to state-of-the-art data mining and analysis, the NASA Distinguished Performance Award, several NASA Group Achievement Awards, the IBM Golden Circle Award, and the Department of Education Merit Fellowship.

Francoise Fogelman-Soulie, VP Strategic Business Development, KXEN
Abstract Social Network Analysis has been one of the hottest topics among data mining scientists in the last 5 years. Meanwhile, more recently, companies, especially in Telco, have progressively started using these techniques to improve their predictive models. Through a few case studies, I will present the questions that SNA can address, the methodology we have used and the results which the companies obtained. I will then present other applications (in retail and social network sites), currently being deployed, with the scientific issues they raise.
Bio Francoise Soulie Fogelman is responsible for leading KXEN business development, identifying new business opportunities for KXEN and working with Product development, Sales and Marketing to help promote KXEN's offer. She is also in charge of managing KXEN's University Program. Ms Soulie Fogelman has over 30 years of experience in data mining and CRM both from an academic and a business perspective. Prior to KXEN, she directed the first French research team on Neural Networks at Paris 11 University where she was a CS Professor. She then co-founded Mimetics, a start-up that processes and sells development environment, optical character recognition (OCR) products and services using neural network technology, and became its Chief Scientific Officer. After that she started the Data Mining and CRM group at Atos Origin and, most recently, she created and managed the CRM Agency for Business & Decision, a french IS company specialized in Business Intelligence and CRM. Ms Soulie Fogelman holds a master’s degree in mathematics from Ecole Normale Superieure and a PhD in Computer Science from University of Grenoble. She was advisor to over 20 PhD on data mining, has authored more than 100 scientific papers and books and has been an invited speaker to many academic and business events.

Rayid Ghani, Researcher, Accenture Technology Labs
Abstract A lot of practical data mining applications deal with settings where the goal is to help human experts find rare cases that are of interest to them. Fraud Detection, Intrusion Detection, Surveillance for security applications, Information Filtering, Recommender Systems are some examples of these applications. A common aspect among all of these problems is that they involve users (or experts) in an interactive classification setting, i.e. the experts are interacting with the results of the data mining system and in turn providing feedback that is valuable for the system. The competing goals of the data mining system are to make these experts more efficient and effective in performing their task as well as getting feedback that would allow it to improve itself over time. In this talk, I will describe this interactive data mining setting, give examples of case studies where this setting applies, and how data mining techniques help manage this tradeoff to build practical interactive systems that are not only useful but also improve over time.

John F. Elder IV, Chief Scientist, Elder Research, Inc.
Abstract If your health and finances are sufficiently poor, the Social Security Administration will send you taxpayer dollars to help out. But, applying and qualifying can be a long and frustrating process - sometimes taking up to two years! In the meantime, your health and finances are undoubtedly worsening. (Likely the reason half of those appealing a rejection eventually get approved; the lack of timely help ensures their deterioration.) Yet, by mining the important text of the applications, the SSA can identify those most likely to be approved upon analyst review, and put them in a much more efficient fast track - helping all applicants. The solution involves text extraction, token collocation, Bayesian inference, and a new way to combine evidence.
Bio
Dr. John Elder heads a data mining consulting team with offices in Charlottesville Virginia, Washington DC, Mountain View California, and Manhasset New York. Founded in 1995, Elder Research, Inc. focuses on investment, commercial and security applications of advanced analytics, including text mining, forecasting, stock selection, image recognition, process optimization, cross-selling, biometrics, drug efficacy, credit scoring, market timing, and fraud detection.
John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he’s an adjunct professor teaching Optimization or Data Mining. Prior to 15 years at ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice University's Computational & Applied Mathematics department.
Dr. Elder has authored innovative data mining tools, is a frequent keynote speaker, and was co-chair of the 2009 Knowledge Discovery and Data Mining conference, in Paris. John’s courses on analysis techniques -- taught at dozens of universities, companies, and government labs -- are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by the President to guide technology for National Security. His book with Bob Nisbet and Gary Miner, Handbook of Statistical Analysis & Data Mining Applications, won the PROSE award for Mathematics in 2009. His book with Giovanni Seni, Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions, was published in February 2010.

R Bharat Rao, Balaji Krishnapuram, Murat Dundar, Siemens Healthcare
Abstract The last century has seen a massive increase in the accuracy and sensitivity of diagnostic tests: from observing external symptoms, to precise laboratory panels, to complex imaging methods for non-invasive internal examinations, to, in the very near future, the use of genomic and molecular analysis at the bedside. This improved diagnostic accuracy has resulted in an exponential increase in the patient data available to the physician. Furthermore, medical knowledge is continuously growing, with physicians being flooded with an expanding array of new tests, updated clinical guidelines on how to diagnose and treat patients, and evidence-based results from clinical trials. Both these trends – the increase in patient data and medical knowledge – will only intensify, as healthcare transforms into the practice of increasingly personalized medicine.
There is a tremendous opportunity for data mining methods to assist the physician, improve patient care, control costs, and ultimately to save lives. In this talk we will provide an overview of the special challenges faced in launching new healthcare data mining products, and identify a few key take aways for entrepreneurs who want to create new businesses in this domain. We begin by analyzing the clinical need for products to mine medical images to enable radiologists to identify cancers and other medical conditions in asymptomatic patients, and thus begin treatment as early as possible. The next step is personalized therapy selection, which requires data mining methods to mine different patient data sources, including images, free text, labs, pharmacy, molecular & genomic data. We discuss how to determine the scope and market size for products such as these, and identify the key methodological issues we have tackled. We focus on the clinical, regulatory and marketing challenges that we have had to solve over the last decade, as we have gone from concepts, to deployed products that are used today in thousands of patient encounters worldwide. We conclude by highlighting results that demonstrate the impact of data mining on patient care and improved outcomes.
Bio Dr. R. Bharat Rao is the Director of Knowledge Solutions in the the Health Services Division in Siemens Healthcare. Headquarted in Malvern, PA, USA, and Knowledge Solutions focuses on developing products and services that (a) help improve patient outcomes by integrating medical knowledge with various parts of a patient record (free text, images, labs, pharmacy, genomics, etc.), and (b) support the increasing drive to personalize medicine.
Dr. Rao received a B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras in 1985, and an M.S. and Ph.D. focusing on machine learning from the Dept. of Electrical Engineering, University of Illinois, Urbana-Champaign, in 1993. He joined Siemens Corporate Research in 1993, and formed the Data Mining group there in 1996. In 2002, he moved to Siemens Healthcare to help found the Computer-Aided Diagnosis & Knowledge Solutions group.
Dr. Rao's research interests include probabilistic inference, machine learning, natural language processing, classification, and graphical models, with a focus on developing decision-support systems that can help physicians improve the quality of patient care. He is particularly interested in the development of novel data mining methods to collectively mine the structured and unstructured parts of a patient record and the automatic integration of medical domain knowledge into the mining process. He has published over 100 papers in peer-reviewed scientific journals and conferences in machine learning and medicine and has filed over 50 patents. In 2005, Siemens honored him with its "Inventor of the Year" award for “outstanding contributions related to improving the technical expertise and the economic success of the company” for developing the REMIND™ (Reliable Extraction and Meaningful Inference from Nonstructured Data) Platform. The REMIND Platform supports both the integration of knowledge into medical
decision-support, as well as the discovery of novel medical knowledge to support personalized medicine. He has twice received the IEEE Data Mining Practice Prize for the best deployed industrial and government data mining application in 2005 (for the REMIND Platform) and 2009 (for Computer-Aided Diagnosis applications).
(Privacy-friendly!) Social Network Targeting for On-line Advertising
Foster Provost, Professor, Leonard N. Stern School of Business, New York University
Abstract I will discuss privacy-friendly methods for finding good audiences for on-line display advertising, by extracting quasi-social networks from browser behavior on user-generated content sites. Targeting social-network neighbors resonates well with advertisers, and on-line browsing behavior data counterintuitively can allow the identification of good audiences anonymously. I will discuss methods for extracting quasi-social networks from data on visitations to social media pages. The data are completely anonymous with respect to both browser identity and content. I will introduce measures of computing which browsers are "close" to other browsers that in the past have exhibited brand affinity. Results show that audiences with high brand proximity indeed show substantially higher brand affinity themselves, as well as higher propensity to convert. Time permitting, I also will present additional findings relating to whether the the quasi-social network actually embeds a true social network, how to gather appropriate training data, and whether on-line advertising actually is effective. This work was done in collaboration with Michael Barnathan, Brian Dalessandro, Rod Hook, Alan Murray, Claudia Perlich, and Xiaohan Zhang.
Bio Foster Provost is Professor, NEC Faculty Fellow, and Paduano Fellow of Business Ethics (Emeritus) at the NYU Stern School of Business. He is Chief Scientist for Coriolis Ventures, a NYC-based early stage venture and incubation firm. In 2001 he was Program Chair of the KDD Conference, and he just retired as Editor-in-Chief of the journal Machine Learning. His main research interests these days include predictive modeling with (social) network data, and alternative methods for data acquisition for data mining. Foster has applied data mining in practice to applications including on-line advertising, fraud detection, network diagnosis, targeted marketing, counterterrorism, and others. His work has won best paper awards at KDD, IBM Faculty Awards, and a President's Award at NYNEX Science and Technology. Last year his work on social network-based marketing systems won the 2009 INFORMS Design Science Award.
Claudia Perlich, Chief Scientist, Media6Degrees
Abstract In 2009 IBM was recognized as a finalist of the INFORMS Edelman competition for its predictive modeling initiative to improve the productivity of its global salesforce and with an estimated business impact of ~ 100 Million dollars. The first component implements some traditional propensity modeling to identify new sales opportunities and is currently used by over 13,000 sales reps. The second 'wallet estimation' component is used strategically to allocate sales resources based on validated analytical estimates of revenue opportunity. In this case study we cover the key elements leading to the success including the data integration, data mining and predictive modeling, solution delivery, human guided model validation, integration of the business process and we conclude with an assessment of the bottom-line business impact.
Bio Prior to joining Media6Degrees, Claudia spent five years working at the Data Analytics Research group at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She has been published in over 30 scientific publications and holds multiple patents in the area of machine learning. Claudia has won many data mining competitions, including the prestigious 2007 KDD CUP on movie ratings, the 2008 KDD CUP on breast-cancer detection, and the 2009 KDD CUP on churn and propensity predictions for telecommunication customers. Claudia received her Ph.D. in Information Systems from Stern School of Business, New York University in 2005 and holds a Master of Computer Science from Colorado University.






