Applied Data Science Invited Speakers

The Applied Data Science Invited Talks will provide a venue for leading experts in the world of applied data mining and knowledge discovery. These invited talks will feature highly influential speakers who have directly contributed to successful data mining applications in their respective fields. The talks and discussions will focus on innovative and leading-edge, large-scale industry or government applications of data mining in areas such as finance, health-care, bio-informatics, public policy, infrastructure, telecommunications, social media and computational advertising.


Keynote: Hongxia Yang

Hongxia Yang

Alibaba

Aligraph: A Comprehensive Graph Neural Network Platform

An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements. Graph Neural Network (GNN) becomes an effective way to address the graph learning problem by converting the graph data into a low dimensional space while keeping both the structural and property information to the maximum extent and constructing a neural network for training and referencing. However, it is challenging to provide an efficient graph storage and computation capabilities to facilitate GNN training and enable development of new GNN algorithms. In this paper, we present a comprehensive graph neural network platform, namelyAliGraph, which consists of distributed graph storage, optimized sampling operators and runtime to efficiently support not only existing popular GNNs but also a series of in-house developed ones for different scenarios. The system is currently deployed at Alibaba to support a variety of business scenarios, including product recommendation and personalized search at Alibaba’s E-Commerce platform. By conducting extensive experiments on a real-world dataset with 492.90 million vertices, 6.82 billion edges and rich attributes, AliGraph performs an order of magnitude faster in terms of graph building (5 minutes vs hours reported from the state-of-the-art PowerGraph platform). At training, AliGraph runs 40%-50% faster with the novel caching strategy and demonstrates around 12 times speed up with the improved runtime. In addition, our in-house developed GNN models all showcase their statistically significant superiorities in terms of both effectiveness and efficiency (e.g., 4.12%–17.19% lift by F1 scores).


Dr. Hongxia Yang is working as the Senior Staff Data Scientist and Director in Alibaba Group. Her interests span the areas of Bayesian statistics, time series analysis, spatial-temporal modeling, survival analysis, machine learning, data mining and their applications to problems in business analytics and big data. She used to work as the Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center respectively and got her PhD degree in Statistics from Duke University in 2010. She has published over 40 top conference and journal papers and held 9 US patents and is serving as the associate editor for Applied Stochastic Models in Business and Industry. She has been been elected as an Elected Members of the International Statistical Institute (ISI) in 2017 and Chinese Institute Electronics in 2019 respectively.

How can we assist you?

We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!

Please enter the word you see in the image below: