KDD 2011 Banner
KDD 2011 Industry Practice Expo Speakers

Title: The Practitioner’s Viewpoint to Data Mining - Key Lessons Learned in the Trenches and Case Studies
Speaker: Richard Boire
In many data mining exercises, we see information that appears on the surface to demonstrate a particular conclusion. But closer examination of the data reveals that these results are indeed misleading. In this session, we will examine this notion of misleading results in three areas:
  • Statistical Issues
  • Overstating of Results
  • Overfitting
Furthermore, we will examine two case studies which explore the use of data in enhancing marketing efforts both from an upsell as well as a retention standpoint.
Statistical Issues
Statistical issues such as multicollinearity and outliers can impact results dramatically. We will first outline how these statistical issues can provide misleading results. At the same time, we will demonstrate how the data mining practitioner overcomes these issues through data analysis approaches that provide both more meaningful and non-misleading results to the business community.
Overstating of Results
From a business standpoint, we will also look at results that appear to be too good to be true. In other words, there appears to be some overstating of results within a given data mining solution. Initially, we will discuss how to identify these situations. Secondly, we will outline what causes this overstatement of results and detail our approach on how we would overcome this predicament.
Another topic for discussion is overfitting of results. This is particularly the case when building predictive models. In this section of the seminar, we will define what overfitting is and why it is becoming more relevant for understanding by the business community. Once again, analytical approaches will be discussed in terms of how to best handle this issue.
Case Studies
The first case study comes from the financial services area where the organization was experiencing challenges in profitably upselling regular credit card customers to a gold card. In this case study, we demonstrate how our 4 step approach is applied in arriving at a solution to this challenge. These 4 steps are as follows:
  • How to Identify the problem
  • How we construct the right data environment to conduct our analytics
  • What kind of analytics are employed which include techniques such as:
    • Correlation analysis
    • EDA Reports
    • Logistic Regression
    • Gains Charts
    More importantly, we discuss how to interpret the output in terms of the actual impact to the business (i.e. increased response rate and ultimately increased ROI.)
  • How do we apply the learning to a future initiative and what were the actual results
The second case study involves the problem of retaining customers for a travel-related company. Adopting the same 4 step approach, we demonstrate the model’s benefit in improving the ROI of any customer initiative. In this initiative, we also demonstrate how in some cases, models can be very robust in terms of their performance longevity.

Speaker Bio:
Richard Boire's experience in database marketing and predictive analytics dates back to 1983, when he received an MBA from Concordia University in Finance and Statistics. His initial experience at organizations such as Reader’s Digest and American Express allowed him to become a pioneer in the application of predictive modelling technology for all direct marketing programs. This extended to the introduction of models which targeted the acquisition of new customers based on return on investment. With this experience, Richard formed his own consulting company back in 1994 which is now called the Boire Filler Group, a Canadian leader in offering analytical and database services to companies seeking solutions to their existing predictive analyticsor database marketing challenges. Richard is a recognized authority on predictive analytics and is among a very few, select top five experts in this field in Canada, with expertise and knowledge that is difficult, if not impossible to replicate in Canada. This expertise has evolved into international speaking assignments and workshop seminars in the U.S. , England, Eastern Europe, and Southeast Asia. Within Canada, he gives seminars on segmentation and predictive analytics for such organizations as Canadian Marketing Association (CMA), Direct Marketing News,Direct Marketing Association Toronto and the Association for Advanced Relationship Marketing(AARM.). His written articles have appeared in numerous Canadian publications such as Direct Marketing News, Strategy Magazine, and Marketing Magazine. He has taught applied statistics, data mining and database marketing at a variety of institutions across Canada which include University of Toronto, George Brown College, Seneca College, etc. Richard is currently Chair at the CMA’s Customer Insight and Analytics Committee and currently sits on the CMA’s Board of Directors. He has chaired numerous full day conferences on behalf of the CMA (the 2000 Database and Technology Seminar as well as the 2002 Database and Technology Seminar and the first-ever Customer Profitability Conference in 2005. He has co-authored white papers on the following topics: ‘Best Practices in Data Mining’ as well as ‘Customer Profitability: The State of Evolution among Canadian Companies’.

Title: Real-Time Risk Control System for CNP (Card Not Present)
Speaker: Tai Hsu
AliExpress is an online e-commerce platform for wholesale products. Credit card is one of its various payment methods. An online transaction using credit cards is called a "card not present" (CNP) transaction where the physical card has not been swiped into a reader. It’s also the major type of credit card frauds causing a great overhead of the online operation, sellers, and buyers. To protect customers on our platform, we developed a real-time credit card fraud detection system, using the machine learning technologies which allows us to achieve a precision of 97%, at a recall of 80%. With the system, we can provide the best online shopping experience for our customers, without the high risk of online transactions which always result a high operational cost. We will briefly share our experience and practice in the expo.

Speaker bio:
Dr. Tai Hsu’s research/specialized areas cover algorithm, artificial intelligence, chemistry, computational biology, cybernetics, data mining, machine learning, robotics, and supercomputing. His work for AliExpress.com significantly reduced the online risk, competitive with Cybersource’s. His work in machine 3D vision won the best paper award (European Meeting in on Cybernetics and Systems Research, 2008 and 2006). His research in computational biology won the NLM award (National Library of Medicine, of National Institute of Health, 2001). His work in quantum chemistry won the best paper award (Journal of Chinese Chemical Society, Taiwan). He was named the distinguished employee (2001) of Providian Financial Corporation. Tai is currently a principal scientist of Alibaba Group, as well as the R&D director of Northwestern Polytechnic University. Prior to Alibaba, he worked for several Fortune 500 companies in San Francisco Bay Area. He holds a Ph.D. in CS from Oregon State University, an MBA from Northwestern Polytechnic University, a B.A. in CS plus a minor in math from Wartburg College, and a B.S. in chemistry from National Taiwan University.

Title: Accelerating Large-Scale Data Mining using In-Database Analytics
Speakers: Mario E. Inchiosa and Michele Chambers
In more and more industries, competitive advantage hinges on exploiting the largest quantity of data in the shortest possible time - and doing so cost-effectively. Data volumes are growing exponentially, while businesses are striving to deploy sophisticated and computationally intensive predictive analytics. Often, massive data is stored in a data warehouse running on dedicated parallel hardware, but advanced analytics is performed on a separate compute platform. Moving data from the data warehouse to the compute environment can constitute a significant bottleneck. Organizations resort to considering only a fraction of their data or refreshing their analyses infrequently. To address the data movement bottleneck and take full advantage of parallel data warehouse platforms, vendors are offering new in-database analytics capabilities. They are opening up their platforms, allowing users to run their own user-defined functions and statistical models as well as vendor- and partner-supplied advanced analytics on the database platform, close to the data, in parallel, without transporting the data through a host node or corporate network. In this talk, we will present the need for in-database analytics and discuss a number of the new solutions available, highlighting case studies where solution times have been reduced from hours to minutes or seconds.

Speaker bios:
Dr. Mario E. Inchiosa is U.S. Chief Scientist at Netezza, an IBM Company, where he develops data-intensive high performance computing appliances. His work focuses in particular on the juncture of data warehousing and parallelized advanced analytics and optimization. Dr. Inchiosa received an A.B. in Physics from Harvard College and an A.M. and Ph.D. in Physics from Harvard University. At Harvard, he combined his dual interests in Physics and Computer Science by applying statistical physics to the study of neural network associative memories. He moved on to study the dynamics of neural network associative memories as a post-doc at the Technical University of Munich. Next, he joined SPAWAR Systems Center San Diego, specializing in stochastic non-linear dynamics, signal detection, Monte Carlo simulation, and high performance computing. He was awarded four patents as a result of his research, and he has published over 30 papers, earning Publication of the Year and Technical Publication Excellence awards. In 2001 Dr. Inchiosa joined BiosGroup, a Santa Fe Institute complexity science spin-off (subsequently NuTech Solutions), applying evolutionary algorithms and swarm-like agent based modeling to problems in business and government. He developed pipeline simulation and optimization engines, served as Principal Investigator researching general and geospatial reasoning under uncertainty, and used agent based models to study global market dynamics and co-evolutionary business strategy optimization. As NuTech’s Chief Science Officer, Dr. Inchiosa was involved with Netezza’s acquisition of NuTech as part of Netezza’s strategy to bring advanced analytics capabilities to data warehouse appliances.

Michele Chambers is an entrepreneurial executive with 20 years of technology experience and is the General Manager & Vice President of Analytic Solutions at Netezza, an IBM Company. The Analytic Solutions team is responsible for working with customers to fully exploit the IBM Netezza appliance via scalable, high performance advanced analytics on IBM Netezza's parallel computing platform. Michele's passion is helping companies identify new areas to apply analytics, especially optimization, that drive high business value and create sustainable differentiation in the market. Michele has a strong focus on results and growth and has successfully launched several lines of businesses including the Analytic Solutions at Netezza. Additionally, Michele successfully built a packaged SAP solutions business resulting in over $10M revenue in the first year and a early supply chain execution software business. In her spare time, Michele, who is a single mother, loves to show her precocious tween the world and challenge him to make the world a better place by applying his mathematical talents to solve real world problems. Michele holds a B.S. in Computer Engineering from Nova Southeastern University and an MBA from Duke University.

Title: Operational Security Analytics - Doing More with Less
Speaker: Colleen McCue
Why just count crime when you can anticipate, prevent and respond more effectively? Companies in the commercial sector have long understood the importance of being able to anticipate or predict future behavior and demand in order to respond efficiently and effectively. Embracing the promise of predictive analytics, the public safety community is moving from a focus on "what happened," to a system that enables the ability to anticipate future events and effectively deploy resources in front of crime; thereby, changing outcomes. While we have become familiar with the use of advanced analytics in support of fraud detection and prevention, techniques similar to those used to support customer loyalty programs and supply chain management have been used to prevent and solve violent crimes, enhance investigative pace and efficacy, support information-based risk and threat assessment, and deploy public safety resources more efficiently. As public safety agencies increasingly are asked to do more with less, the ability to anticipate crime represents a game changing paradigm shift; enabling information-based tactics, strategy and policy in support of prevention and response. Reporting, collecting and compiling data are necessary but not sufficient to increasing public safety. Ultimately, the ability to anticipate, prevent and respond more effectively will enable us to do more with less and change public safety outcomes.

Speaker Bio:
Dr. Colleen McLaughlin McCue, GeoEye Analytics, brings over 18 years of experience in advanced analytics and the development of actionable solutions to complex information processing problems in the applied public safety and national security environment. Her areas of expertise include the application of data mining and predictive analytics to the analysis of crime and intelligence data, with particular emphasis on deployment strategies, surveillance detection, threat and vulnerability assessment, fraud detection, geospatial predictive analytics, and the behavioral analysis of violent crime. Dr. McCue's experience in the applied law enforcement setting and pioneering work in operationally relevant analytical strategies has been used to support a wide array of national security and public safety clients. Dr. McCue has published her research findings in journals and book chapters, and has authored a book on the use of advanced analytics in the applied public safety environment entitled, Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis. Dr. McCue earned her undergraduate degree from the University of Illinois at Chicago and Doctorate in Psychology from Dartmouth College. She completed a five-year postdoctoral fellowship in the Department of Pharmacology & Toxicology at the Medical College of Virginia at the Virginia Commonwealth University.

Title: The Power of Analysis and Data
Speaker: David Norton
Caesars Entertainment, the largest provider of branded casino entertainment, captures a wealth of data for 40 million+ customers through its Total Rewards program. In-depth data analysis has helped Caesars weather the economic downturn by prioritizing marketing spend, expense savings targets and identifying new revenue opportunities. This talk will describe how closed-loop marketing, state-of-the-art user segmentation, and ongoing experimentation via test and control groups have enabled Caesars Entertainment to achieve all-time high customer satisfaction scores and outperform the competition in a challenging economic climate. The lessons learned are generic and apply across multiple industries. Insights will also be provided on the next wave of challenges to be answered analytically.

Speaker bio: David Norton is the Senior Vice President and Chief Marketing Officer at Caesars Entertainment, which operates more than 40 casinos nationwide and 10 others worldwide and has been recognized for its outstanding marketing practices by the Wall St. Journal, Info Week and CIO Magazine. HET’s brands include Caesars, Horseshoe, Harrah’s, World Series of Poker, Paris, Flamingo and several others. Norton is responsible for the company’s direct marketing strategy, Brand Management, Promotions, Alliances, Research, VIP marketing, revenue management, the Total Rewards loyalty program, Internet marketing, multi-cultural marketing, mobile initiatives, Retail, Entertainment, Sales and Travel Services. Prior to joining Harrah’s in October of 1998, Norton worked in the credit card industry with American Express, Household International and MBNA. He has a B.S. in Finance from Boston College, an MBA from Loyola College and a Masters in Management of Technology from the University of Pennsylvania and the Wharton School.

Title: Knowledge Discovery and Data Mining in Pharmaceutical Cancer Research
Speaker: Paul Rejto
Biased and unbiased approaches to develop predictive biomarkers of response to drug treatment will be introduced and their utility demonstrated for cell cycle inhibitors. Opportunities to leverage the growing knowledge of tumors characterized by modern methods to measure DNA and RNA will be shown, including the use of appropriate preclinical models and selection of patients. Furthermore, techniques to identify mechanisms of resistance prior to clinical treatment will be discussed. Prospects for systematic data mining and current barriers to the application of precision medicine in cancer will be reviewed along with potential solutions.

Speaker bio:
Dr. Rejto is Director of Computational Biology, Oncology Research Unit, Pfizer La Jolla. His research interests include computational methods to support target discovery and validation, animal models, patient selection, resistance modeling and combination therapy. Paul trained in physical and theoretical chemistry (Harvard A.B. magna cum laude; Stanford Ph.D.; UC Berkeley post-doc) and joined Pfizer La Jolla (Agouron) in 1994. During his career, Paul has developed and applied tools for structure-based drug design, led a team that progressed compounds for the treatment of diabetes into the clinic, and built the computational biology group at Pfizer La Jolla. Coaching youth soccer has taught him about leadership.

Title: Broad Scale Predictive Modeling and Marketing Optimization in Retail Sales
Speakers: Dan Steinberg and Felipe Fernandez Martinez
The challenge of predicting retail sales on a product-by-product basis throughout a network of retail stores has been researched intensively by applied econometricians and statisticians for decades. The principal tools of analysis have been linear regression with Bayesian inspired adjustments to stabilize demand curve estimates. The scale of such analytics can be challenging as retailers often work with more than 100,000 products (SKUs) and typically operate networks of hundreds of brick and mortar stores. Department and grocery stores are excellent examples but fast food restaurants also require such detailed predictive modeling systems. Depending on the objectives of the company, predictions may be required for blocks of time spanning a week or more, or, as in the case of fast food operators, predictions are required for each 15-minute time interval of the operating day. The authors have modernized industry standard approaches to such predictive modeling by leveraging advanced data mining techniques and constructing a completely automated prediction and optimization. The modern techniques are more adept in detecting nonlinear response and accommodating interactions and automatically sifting through hundreds if not thousands of potential factors influencing sales outcomes. Our results confirm that conventional statistical models miss a substantial fraction of the explainable variance and that the modern methods dominate in terms of performance and speed of model development.
Accurate prediction is required for reliable planning and logistics, and is also essential for optimization. Optimization with respect to pricing, promotion and assortment can be asked for relative to a variety of objectives (e.g. revenue, profits) and short term and long-term optimization may result in different decisions being taken. A unique challenge for retailers is encountered in the large number of constraints to which most complex retail organizations are subject. Contracts and special understandings with valued suppliers will severely constrain a retailer’s flexibility. For example, certain products may not be promotable (or discounted) in isolation, and others (say from competitors) may not be promoted jointly, and the costs of goods sold may well depend on the quantities contracted. We discuss how we have resolved such challenges via a cycle of prediction and simulation driven from a database to develop a flexible high speed system that can deal with arbitrary constraints, arbitrary objectives, and achieve new levels of predictive accuracy and reliability.

Speaker Bios:
Dan Steinberg is CEO and founder of Salford Systems, the developer of the CART® decision tree, MARS® spline regression, TreeNet® gradient boosting, Beriman's RandomForests®, and other influential data mining technology. After earning a PhD in Econmometrics at Harvard Dan began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. His consulting experience at Salford Systems has included complex modeling projects for major banks world wide, including CitiBank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Dan led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. Dan has published papers in economics, econometrics, and computer science journals, and contributes actively to the ongoing R & D at Salford.

Felipe Fernandez Martinez obtained a degree in Chemical Engineering at the Universidad Michoacana de San Nicolás de Hidalgo in Morelia, México and subsequently completed an MBA at the Instituto Panamericano de Alta Dirección de Empresas (IPADE) and certificates in corporate finance at ESCP Europe and the Institituto Tecnológico Autonomo de México (ITAM). Felipe worked at Carrefour, the world's 2nd largest retailer, for over 12 years. He was Carrefour’s Director of Strategic Projects for Latin America with responsibility for Cost optimization, Procurement, Pricing, Supply Chain, and implementation of new analytics tools for Carrefour, Brazil. Prior to joining Carrefour Latin America, Felipe worked for the Carrefour Group in senior level positions in Paris, Italy and Mexico as Cost Optimization Director. Felipe currently is CEO and partner at Interefe, where he advises retailers on projects turning complexity into competitive advantages on three main areas of expertise: Energy efficiency, Analytics and Cost optimization. Felipe is fluent in Spanish, French, Portuguese, Italian and English.

Title: Applications of Data Mining and Machine Learning in Online Customer Care
Speakers: Ravi Vijayaraghavan and P V Kannan
With the coming of age of web as a mainstream customer service channel, B2C companies have invested substantial resources in enhancing their web presence. Today customers can interact with a company, not only through the traditional phone channel but also through chat, email, social media or web self-service. With the availability of web logs, CRM data and text transcripts these online channels are rich with data and they track several aspects of customer behavior and intent. 24/7 Customer Innovation Labs has developed a series of data mining and statistics driven solutions to improve customer experience in each of these online channels [1].
This talk will focus on solutions we have developed to enhance performance of web chat as a customer service channel. 2 stages of customer life-cycle will be considered for the purpose of this study– new customer acquisition (or sales) and service of existing customers. In customer acquisition the key objective is to maximize "incremental" revenues through the chat channel. While in customer service the objective is to drive up the quality of customer experience (as measured by customer satisfaction surveys or mined customer sentiments) through chat. In both these scenarios, applications of data mining/text mining and machine learning have been developed and deployed in:
  1. Real-time targeting of the right visitors to chat
  2. Predicting customer needs
  3. Routing customer to the customer service representatives with the right skill set
  4. Mining chat transcripts and Social Media Portals to identify key customer issues and customer
  5. sentiments
  6. Mining representatives’ responses to identify opportunities for improving performance
  7. Feeding back learning from 4 and 5 to 1 (better targeting)
Real-life case studies will be shown to demonstrate that this closed loop solution can quickly improve the customer care systems’ performance on key metrics such as sales revenue, customer satisfaction, loyalty and retention.

[1] Vijayaraghavan, Ravi et al, Predictive Systems for Customer Interactions, Service Systems Implementation, Chapter 18, Springer, 2011

Speaker Bios:
Ravi Vijayaraghavan is a Vice-President at 24/7 Customer Innovation Labs where he leads the Analytics and Data Sciences Organization. His team builds data-driven solutions and predictive systems that enable superior customer acquisition and customer service through online and offline channels. Prior to 24/7 Customer, Ravi was at Ford Motor where he started his career at Ford Research Laboratories. His research in Ford was in the application of large scale numerical computations for engineering design. Later, he took up a position in the IT Strategy organization. In each of these roles he drove the use of mathematical and quantitative methods to improve decision-making capability. Most recently he led the development and implementation of analytics driven solution to improve the profitability of Ford of Brazil. Ravi was also a Vice President and part of the executive leadership team of Mu Sigma Inc., a Chicago based pure-play analytical services company, where he was responsible for client management. As a researcher, Ravi has several refereed and invited publications in major scientific and Technical journals and has presented as an invited speaker at several international conferences. He has served in leadership committees in academic societies such as Sigma Xi. In 2004, he was the recipient of a Henry Ford Technology award - the highest technical recognition at Ford Motor Company. Ravi holds a B.Tech degree from Indian Institute of Technology, Madras, a PhD in Engineering from University of Wisconsin-Madison and an MBA (with high distinction) in Strategy and Finance from Ross School of Business, University of Michigan, Ann Arbor.

PV Kannan co-founded 24/7 Customer in 2000. 24/7 Customer is the first in the industry to provide services across the entire customer lifecycle. 24/7 Customer provides its services to global 1000 companies, round the clock from locations around the world. PV’s vision is to make it simple and easy for companies to acquire and retain their customers, and for customers to seek solutions to their queries, anytime and anywhere in the world. PV has been featured as a thought leader in the field of global sourcing, including Tom Friedman's ``The World is Flat", ``Outsourcing Thought Leaders" by Booz Allen Hamilton, and in Fortune Magazine. He is also a regular speaker at industry events and has been a panelist in Forbes Conference 2006, Academy of 2006 Annual Meeting of the Academy of Management, to name a few. PV's career in on-line and off-line customer sales and service has resulted in a number of U.S. and international patent filings. Prior to 24/7 Customer, PV Kannan was an officer and VP at Kana Software. In 1995, he founded his first company Business Evolution (BEI) based in Princeton , New Jersey , which was acquired by Kana Software in 1999. Prior to Business Evolution, PV worked in the U.S. and Europe in various leadership roles overseeing IT projects.

The department-store retailer John Wanamaker famously stated, “Half the money I spend on advertising is wasted—I just don’t know which half.” Compared with the measurement of advertising effectiveness in traditional media, online advertisers and publishers have considerable data advantages, including individual-level data on advertising exposures, clicks, searches, and other online user behaviors. However, as I shall discuss in this talk, the science of advertising effectiveness requires more than just quantity of data - even more important is the quality of the data. In particular, in many cases, using various statistical techniques with observational data leads to incorrect measurements. To measure the true causal effects, we run controlled experiments that suppress advertising to a control group, much like the placebo in a drug trial. With experiments to determine the ground truth, we can show that in many circumstances, observational-data techniques rely on identifying assumptions that prove to be incorrect, and they produce estimates differing wildly from the truth. Despite increases in data availability, Wanamaker's complaint remains just as true for online advertising as it was for print advertising a century ago.
In this talk, I will discuss recent advances in running randomized experiments online, measuring the impact of online display advertising on consumer behavior. Interesting results include the measurable effects of online advertising on offline transactions, the impact on viewers who do not click the ads, the surprisingly large effects of frequency of exposure, and the heterogeneity of advertising effectiveness across users in different demographic groups or geographic locations. I also show that sample sizes of a million or more customers may be necessary to get enough precision for statistical significance of economically important effects - so we have just reached the cusp of being able to measure effects precisely with present technology. (By comparison, previous controlled experiments using split-cable TV systems, with sample sizes in the mere thousands, have lacked statistical power to measure precise effects for a given campaign.) As I show with several examples that establish the ground truth using controlled experiments, the bias in observational studies can be extremely large, over-or-underestimating the true causal effects by an order of magnitude. I will discuss the (implicit or explicit) modeling assumptions made by researchers using observational data, and identify several reasons why these assumptions are violated in practice. I will also discuss future directions in using experiments to measure advertising effectiveness.

Speaker Bio:
David Reiley is a Principal Research Scientist at Yahoo! Research, where he is using experiments to measure the effects of display advertising. A leader in the field-experiments revolution in economics and the social sciences, he has published experiments on diverse topics, from auction bidding to charitable fundraising, in leading economics journals. Before coming to Yahoo!, Reiley was Arizona Public Service Professor of Economics at the University of Arizona. He has also taught at Vanderbilt University and at the Kellogg School of Management at Northwestern University. Reiley is the Co-Editor for Field Experiments at Economic Inquiry, and a coauthor (with Avinash Dixit and Susan Skeath) of the best-selling game theory textbook Games of Strategy, Third Edition. He holds a bachelor’s degree in Astrophysical Sciences from Princeton University, and a PhD in economics from MIT.

Meaningful work is a deep human need. We all yearn to contribute to something greater than ourselves, be listened to, and work alongside friendly peers. Data mining consulting is a powerful way to use technical skills and gain these great side benefits. The power of analytics and its high return on investment makes one's expertise welcome virtually everywhere. And the variety of projects and domains encountered leads to continual learning as new problems are met and solved. Teaching and writing are possible, and there is great satisfaction in seeing one's work actually implemented and used, potentially touching millions.
Still, in industry, one has the joy and hazards of working closely with other humans, where final success can depend as much on others as oneself, and on social as well as technical issues. In my experience, business risk strongly outweighs technical risk in whether a solution is used. I will share some hard-won lessons learned on how to best succeed, both technically and socially, in the results-oriented world of industry.

Speaker Bio:
Dr. John Elder heads a data mining consulting team with offices in Charlottesville Virginia, and Washington DC (www.datamininglab.com). Founded in 1995, Elder Research, Inc. focuses on investment, commercial and security applications of advanced analytics, including text mining, stock selection, image recognition, process optimization, cross-selling, biometrics, drug efficacy, credit scoring, market timing, and fraud detection.
John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he’s an adjunct professor teaching Optimization or Data Mining. Prior to 15 years at ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice University's Computational & Applied Mathematics department.
Dr. Elder has authored innovative data mining tools, is a frequent keynote speaker, and was co-chair of the2009 Knowledge Discovery and Data Mining conference, in Paris. John’s courses on analysis techniques -- taught at dozens of universities, companies, and government labs -- are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by the President to guide technology for National Security. His book with Bob Nisbet and Gary Miner, Handbook of Statistical Analysis & Data Mining Applications, won the PROSE award for Mathematics in 2009. His book with Giovanni Seni, Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions, was published in February 2010.
John is a follower of Christ and the proud father of 5.

Title: Analytics for Political Campaigns
Speaker: Rayid Ghani
Political campaigns today have access to large amounts of data about actions, behaviors, and preferences of voters. The challenge they face is how to most effectively use this data to make quantitative decisions and guide election strategy across all channels (on the ground, online, and in the media). In this talk, I will talk about the types and breadth of data campaigns have access to, the kinds of statistical approaches used to predict electoral outcomes and optimally allocate resources across different contact channels such as field programs, phone calls, TV, emails, online and social media. In addition, I’ll give examples from the Obama 2008 campaign and how data and empirical experiments were used to improve decision-making. I’ll end with a set of data and analytics challenges that are critical for political campaigns that the analytics community can help solve.

Speaker Bio:
Rayid Ghani is the Chief Scientist at Obama for America focusing on data, analytics, technology, and social media for the Obama re-election campaign in 2012. Previously, Rayid was a Senior Researcher at Accenture Labs and led a research group focused on applied research in Machine Learning & Data Mining. His research interests include Machine Learning & Data Mining for Business Applications with special focus on Text Learning, Semi-supervised, and Active Learning. Most recently, Rayid has been involved in projects related to Healthcare Data Mining, Social Media, and Enterprise Information Retrieval and is now focusing on how to use data and analytics to influence and win political campaigns. Rayid has published several journal, conference, and workshop papers, edited and co-authored books, organized several workshops at ICML and KDD and has been a member of organizing and program committees of several conferences including ICML, KDD, SIGIR, ECML, and WWW. More information is available at

Tweetme! Facebook
Follow us on LinkedIn

Gold Sponsors:
Microsoft Logo Yahoo! Labs Logo

Silver Sponsors:
Google Logo SDSIC Logo
IBM Logo SAS Logo

Accenture Logo

Click Here to Become a Sponsor!