In practice, end-to-end data analysis is rarely a cleanly engineered process. Acquiring data can be tricky. Data assessment, wrangling and feature extraction are time-consuming and subjective. Models and algorithms used to derive data products are highly contextualized by time-varying properties of data sources, code and application needs. All of these issues would ideally benefit from an organizational view, but are often driven by individual users.

Viewed holistically, both agile analytics and the establishment of analytic pipelines involve interactions between people, computation and infrastructure. In this talk I’ll share some anecdotes from our research, user studies, and field experience with companies (Trifacta, Captricity), as well as an emerging open-source project (Ground).

Joseph M. Hellerstein is the Jim Gray Professor of Computer Science at the University of California, Berkeley, whose work focuses on data-centric systems and the way they drive computing. He is an ACM Fellow, an Alfred P. Sloan Research Fellow and the recipient of three ACM-SIGMOD “Test of Time” awards for his research. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology , and MIT’s Technology Review magazine included his work on their TR10 list of the 10 technologies “most likely to change our world”.

Hellerstein is the co-founder and Chief Strategy Officer of Trifacta, a software vendor providing intelligent interactive solutions to the messy problem of wrangling data. He serves on the technical advisory boards of a number of computing and Internet companies including EMC, SurveyMonkey, Captricity, and Dato, and previously served as the Director of Intel Research, Berkeley.