KDD Papers

A Data Science Approach to Understanding Residential Water Contamination in Flint

Jacob Abernethy (University of Michigan, Ann Arbor);Alex Chojnaki (Michigan Data Science Team);Chengyu Dai (Michigan Data Science Team);Arya Farahi (Michigan Data Science Team);Eric Schwartz (Ross School of Business);Jared Webb (Brigham Young University);Guangsha Shi (University of Michigan);Daniel T. Zhang (Michigan Data Science Team)


The Flint Water Crisis was followed by a huge investment by residents and government officials to sample and test the water in Flint homes in order to understand the causes and extent of the lead contamination. This trove of data, most of which was made publicly available, is by far the largest dataset collected on lead in a municipality water system. In this paper we study several aspects of Flint’s water troubles, and we lay out a number of analytical and algorithmic results on lead poisoning, many of which generalize well beyond one city. For example, we show that elevated lead risks are surprisingly predictable, to a reasonable extent, and we explore various factors associated with elevated lead. These risk assessments, developed in large part via a crowdsourced prediction challenge at the University of Michigan, have been incorporated into an informational web and mobile application, funded by Google.org, designed to target Flint residents. We also explore questions of self-selection in the residential testing program, and what factors induce residents to voluntarily sample their water, when they test, and how often.