Social Impact Track

The Social Impact Track offers a half-day focused exclusively on innovative research and collaborative interdisciplinary projects in socially relevant areas. This year we have a rich set of presentations in areas supported by cross-sector partnerships. These span societal issues such as forced migration prediction, mitigating bias in social media, urban population vulnerability, informatic tools for public data sharing, and increasing the usefulness of UN humanitarian crisis data. In addition we also highlight novel data science education approaches, data science for social good programs and diversity and inclusion efforts in data science. We seek to bring together a diverse community of researchers, students and industry practitioners engaged in using data science for good and AI ethics. These cross-disciplinary and cross-sector partnerships show how state-of-the-art research and applications can be harnessed for societal benefit. We welcome your participation and insights as we engage in a rich discussion with our presenters.

Social Impact Workshop

Workshop Chairs: Sarah Stone (University of Washington), Vani Mandava (Microsoft Research)

Wednesday, August 7th, 8:15AM – 11:45AM, Tukatnu Ballroom - Level 3, Dena’ina

Program Details

8:15 - 8:45

Invited talk: Women in Data Science
Karen Matthys (Stanford University)

Big Data market revenues are projected to reach $103 Billion by 2027. This expanding field presents a great opportunity for women and minorities to take on technical and leadership roles in all sectors. One way that we are addressing this opportunity is through the Women in Data Science Conference (WiDS), which was launched at Stanford four years ago and now reaches over 100,000 people worldwide. Social Impact is a key driver and topic for WIDS. This talk with share outcomes from WIDS global collaboration, lessons learned, and plans for WIDS 2020.

8:45 - 9:15

Invited talk: From intentions to impact: Articulating and embracing social complexity in Data Science for Social Good
Anissa Tanweer (University of Washington)

In this talk, Anissa Tanweer draws on five years of experience as an ethnographer observing and participating in the Data Science for Social Good (DSSG) program at the University of Washington. Using examples from DSSG projects. She surfaces several challenges in moving from the intention to do good toward the actualization of positive impact. Ultimately, our goal should be to embrace — rather than reduce — the messiness and complexity inherent in efforts to do good with data science.

9:15 - 9:30

Invited talk: Should You Open-Source Your Model? Ethical questions for open-sourcing Machine Learning models.
Robert Munro

This talk covers a set of ethical guidelines to help data scientists decide whether to open-source their Machine Learning models. Many questions are existing ones related to open-source data while some are new questions that are specific to Machine Learning. Examples are drawn from real uses of open data and Machine Learning in disaster response and healthcare over the last decade, sharing lessons learned. A set of questions are provided that data scientists should ask when deciding whether or not to open-source a model. It is recommended that data scientists create a “Model Statement” to be explicit about a model’s capabilities and to help with the decision.

9:30 - 10:00

Coffee Break

10:00 - 10:15

Scikid-Learn: An AI tool for customized education.
Andy Spezzatti (UC Berkeley), Mike Lawrence, Shine Shan, Niema El Bouri, Ada Tanyindawn, Aljaz Kosmerlj, James Hodson

Education is currently not optimized, with the large level gap between students and individual differences not sufficiently leveraged. From these observations, we built SciKid-Learn: a mobile application for customized and adaptive learning that learns from students, recommends content as well as provide tests to evaluate progresses. The solution leverages state of the art NLP techniques and use webscrapping for content recommendations. It was first developed thinking of applications for students in developing countries, lacking an easy access to a good education. Yet, we also found it more generally relevant for every teaching institution, as a support to the traditional approach.

Supporting Docs

10:15 - 10:30

AI-Guided Virtual Lab for Autonomous Vehicle Test: Self-Play Reinforcement Learning Based Two-Player-Game.
Zhaobin Mo; Xuan Di (Columbia Univ)

Autonomous Vehicles (AVs) have shown great potential as a game-changer for the transportation system. The penetration rate of AVs in the US, however, still remains below 0.01%. Guided by the knowledge of traffic flow theory, machine learning, and game theory, the goal of this work is thus to develop a virtual lab, i.e., an AV test platform, where various autonomous driving algorithms are tested and validated in a hybrid traffic environment.

Supporting Docs

10:30 - 10:45

Mitigating Demographic Biases in Social Media-Based Recommender Systems.
Rashidul Islam; Kamrun Naher Keya; Shimei Pan; James Foulds (Univ of Maryland)

As a growing proportion of our daily human interactions are digitized and subjected to algorithmic decision-making on social media platforms, it has become increasingly important to ensure that these algorithms behave in a fair manner. In this work, we study fairness in collaborative-filtering recommender systems trained on social media data. We empirically demonstrate the prevalence of demographic bias in these systems for a large Facebook dataset. We then present a simple technique to mitigate bias in social media-based recommender systems.

Supporting Docs

10:45 - 11:00

Communicating Machine Learning Results About the Flint Water Crisis to City Residents at Scale.
Jared Webb (Univ of Michigan); Eric Schwartz (University of Michigan); Jacob Abernethy (Georgia Tech); Stacy Woods (NRDC)

We developed a combined active and machine learning approach to produce a probability that each home in Flint, Michigan has lead pipes to help the city minimize recovery costs. Over the past several years, our work has all been “backend,” dealing with legal teams, the city council, and the recovery team. Now, we are developing a public facing website to communicate information and predictions to the citizenry. Our main outreach tool is an interactive map that a resident can use to observe the replacement efforts and our up-to-date predictions.

Supporting Docs

11:00 - 11:15

Their Futures Matter Family Investment Model.
Peter Mulquiney (Taylor Fry)

Their Futures Matter was set up by the Australian State of New South Wales following a 2015 review of the way government as a whole relates to vulnerable children and families. Central to this work is an Investment Model to project lifetime pathways for all NSW residents currently under age 25 (almost 3M people). By linking data at the individual level across multiple government agencies we can paint a rich picture of vulnerability across the population. By using data mining, statistical modelling, micro-simulation and actuarial valuation techniques we can identify patterns in that data which allow us to understand who in the population is likely to have poor outcomes and what those likely outcomes are across a person’s lifetime.

Supporting Docs

11:15 - 11:30

Machine Learning for Humanitarian Data: Tag Prediction using the HXL Standard.
Vinitra Swamy (Microsoft); Elisa Chen (UC Berkeley); Anish Vankayalapati (UC Berkeley), Abhay Aggarwal (UC Berkeley), Chloe Liu (UC Berkeley), Vani Mandava (Microsoft), Simon Johnson (UN Office of Humanitarian Affairs)

Humanitarian datasets created by goverment and non-profit organizations often face the challenge of data interoperability. In crisis response situations, field workers spend valuable time on data wrangling tasks that can be better spent contributing directly to relief efforts. We propose a supervised deep learning model to predict standardized Humanitarian eXchange Language (HXL) tags on datasets from the United Nation’s open data platform, the Humanitarian Data eXchange (HDX). This work is a collaboration between Microsoft, UC Berkeley, and the UN Office for the Coordination of Humanitarian Affairs (UN OCHA) with the goal of saving time for deployed crisis responders and making downstream data analysis and visualization tools more effective.

Supporting Docs

11:30 - 11:45

Mining large-scale news articles for predicting forced migration.
Sadra Abrishamkar (York University); Farouq Khonsari (York University)

Many people are being displaced every day from all around the globe. Many of them are forced to leave their homes because of socio-political conflicts, human-made or natural disasters. In order to develop an early warning system for forced migration in the context of humanitarian crisis, it is essential to study the factors that cause forced migration, and build a model to predict the future number of displaced people. In this research, we focus on studying forced migration due to socio-political conflicts for which violence is the main reason. In particular, we investigate whether the degree of violence in a specific region can be detected from news articles related to that region and whether the detected violence scores can be used to improve the prediction accuracy.

Supporting Docs

How can we assist you?

If you have a question that requires immediate attention, please feel free to contact us. Thank you!

Please enter the word you see in the image below: