SIGKDD Proceedings

WSDM ‘19- Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Full Citation in the ACM Digital Library

SESSION: Keynote & Invited Talks

Attending to What Matters

  • Jaime Teevan

Online services are increasingly intelligent. They evolve intelligently through A/B testing and experimentation, employ artificial intelligence in their core functionality using machine learning, and seamlessly engage human intelligence by connecting people in a low-friction manner. All of this has resulted in incredibly engaging experiences—but not particularly productive ones. As more and more of people’s most important tasks move online we need to think carefully about the underlying influence online services have on people’s ability to attend to what matters to them. There is an opportunity to use intelligence for this to do more than just not distract people and actually start helping people attend to what matters even better than they would otherwise. This presentation explores the ways we might make it as compelling and easy to start an important task as it is to check social media.

Responsible Data Science

  • H. V. Jagadish

Technologists have a responsibility to develop Data Science and AI methods that satisfy fairness, accountability, transparency, and ethical requirements. This statement has repeatedly been made in recent years and in many quarters, including major newspapers and magazines. The technical community has responded with work in this direction. However, almost all of this work has been directed towards the decision-making algorithm that performs a task such as scoring or classification. This presentation examines the Data Science pipeline, and points out the importance of addressing responsibility in all stages of this pipeline, and not just the decision-making stage. The presentation then outlines some recent research results that have been obtained in that regard.

Alexa Everywhere: AI for Daily Convenience

  • Rohit Prasad

The computing industry has been on an inexorable march toward simplifying human-computer interaction, and earlier this decade Amazon bet big on combining voice technology and artificial intelligence. In 2014, with the introduction of Echo and Alexa, Amazon created an entirely new technology category with an AI-first strategy and vision. Since then, Alexa has captured the imagination of customers across the globe, and the company has accelerated the pace of AI research and innovation in support of its promise to improve Alexa every day. In this presentation Rohit Prasad, Vice President and Head Scientist of Amazon Alexa, shares his insights into how recent scientific innovations are advancing Alexa.

Privacy-Preserving WSDM

  • Aleksandra Korolova

The goals of learning from user data and preserving user privacy are often considered to be in conflict. This presentation will demonstrate that there are contexts when provable privacy guarantees can be an enabler for better web search and data mining (WSDM), and can empower researchers hoping to change the world by mining sensitive user data. The presentation starts by motivating the rigorous statistical data privacy definition that is particularly suitable for today’s world of big data, differential privacy. It will then demonstrate how to achieve differential privacy for WSDM tasks when the data collector is trusted by the users. Using Chrome’s deployment of RAPPOR as a case study, it will be shown that achieving differential privacy while preserving utility is feasible even when the data collector is not trusted. The presentation concludes with open problems and challenges for the WSDM community.

Reinforcement Learning to Rank

  • Maarten de Rijke

Interactive systems such as search engines or recommender systems are increasingly moving away from single-turn exchanges with users. Instead, series of exchanges between the user and the system are becoming mainstream, especially when users have complex needs or when the system struggles to understand the user’s intent. Standard machine learning has helped us a lot in the single-turn paradigm, where we use it to predict: intent, relevance, user satisfaction, etc. When we think of search or recommendation as a series of exchanges, we need to turn to bandit algorithms to determine which action the system should take next, or to reinforcement learning to determine not just the next action but also to plan future actions and estimate their potential pay-off. The use of reinforcement learning for search and recommendations comes with a number of challenges, because of the very large action spaces, the large number of potential contexts, and noisy feedback signals characteristic for this domain. This presentation will survey some recent success stories of reinforcement learning for search, recommendation, and conversations; and will identify promising future research directions for reinforcement learning for search and recommendation.

SESSION: Session 1: Search and Ranking

Session details: Session 1: Search and Ranking

  • Maarten de Rijke University of Amsterdam

Fast Dictionary-Based Compression for Inverted Indexes

  • Giulio Ermanno Pibiri
  • Matthias Petri
  • Alistair Moffat

Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.

Joint Optimization of Cascade Ranking Models

  • Luke Gallagher
  • Ruey-Cheng Chen
  • Roi Blanco
  • J. Shane Culpepper

Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful approach to balancing efficiency and effectiveness trade-offs in large-scale search systems. However, learning a cascade model is often complex, and usually performed stagewise independently across the entire ranking pipeline. In this work we show that learning a cascade ranking model in this manner is often suboptimal in terms of both effectiveness and efficiency. We present a new general framework for learning an end-to-end cascade of rankers using backpropagation. We show that stagewise objectives can be chained together and optimized jointly to achieve significantly better trade-offs globally. This novel approach is generalizable to not only differentiable models but also state-of-the-art tree-based algorithms such as LambdaMART and cost-efficient gradient boosted trees, and it opens up new opportunities for exploring additional efficiency-effectiveness trade-offs in large-scale search systems.

WassRank: Listwise Document Ranking Using Optimal Transport Theory

  • Hai-Tao Yu
  • Adam Jatowt
  • Hideo Joho
  • Joemon M. Jose
  • Xiao Yang
  • Long Chen

Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of [email protected] are 15%, 5%, 7%, 5%, respectively.

MSA: Jointly Detecting Drug Name and Adverse Drug Reaction Mentioning Tweets with Multi-Head Self-Attention

  • Chuhan Wu
  • Fangzhao Wu
  • Zhigang Yuan
  • Junxin Liu
  • Yongfeng Huang
  • Xing Xie

Twitter is a popular social media platform for information sharing and dissemination. Many Twitter users post tweets to share their experiences about drugs and adverse drug reactions. Automatic detection of tweets mentioning drug names and adverse drug reactions at a large scale has important applications such as pharmacovigilance. However, detecting drug name and adverse drug reaction mentioning tweets is very challenging, because tweets are usually very noisy and informal, and there are massive misspellings and user-created abbreviations for these mentions. In addition, these mentions are usually context dependent. In this paper, we propose a neural approach with hierarchical tweet representation and multi-head self-attention mechanism to jointly detect tweets mentioning drug names and adverse drug reactions. In order to alleviate the influence of massive misspellings and user-created abbreviations in tweets, we propose to use a hierarchical tweet representation model to first learn word representations from characters and then learn tweet representations from words. In addition, we propose to use multi-head self-attention mechanism to capture the interactions between words to fully model the contexts of tweets. Besides, we use additive attention mechanism to select the informative words to learn more informative tweet representations. Experimental results validate the effectiveness of our approach.

SESSION: Session 2: Knowledge Graphs and Analytics

Session details: Session 2: Knowledge Graphs and Analytics

  • Marc Najork Google

Integrating Local Context and Global Cohesiveness for Open Information Extraction

  • Qi Zhu
  • Xiang Ren
  • Jingbo Shang
  • Yu Zhang
  • Ahmed El-Kishky
  • Jiawei Han

Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.

Knowledge Graph Enhanced Community Detection and Characterization

  • Shreyansh Bhatt
  • Swati Padhee
  • Amit Sheth
  • Keke Chen
  • Valerie Shalin
  • Derek Doran
  • Brandon Minnery

Recent studies show that by combining network topology and node attributes, we can better understand community structures in complex networks. However, existing algorithms do not explore “contextually” similar node attribute values, and therefore may miss communities defined with abstract concepts. We propose a community detection and characterization algorithm that incorporates the contextual information of node attributes described by multiple domain-specific hierarchical concept graphs. The core problem is to find the context that can best summarize the nodes in communities, while also discovering communities aligned with the context summarizing communities. We formulate the two intertwined problems, optimal community-context computation, and community discovery, with a coordinate-ascent based algorithm that iteratively updates the nodes’ community label assignment with a community-context and computes the best context summarizing nodes of each community. Our unique contributions include (1) a composite metric on Informativeness and Purity criteria in searching for the best context summarizing nodes of a community; (2) a node similarity measure that incorporates the context-level similarity on multiple node attributes; and (3) an integrated algorithm that drives community structure discovery by appropriately weighing edges. Experimental results on public datasets show nearly 20 percent improvement on F-measure and Jaccard for discovering underlying community structure over the current state-of-the-art of community detection methods. Community structure characterization was also accurate to find appropriate community types for four datasets.

Representation Interpretation with Spatial Encoding and Multimodal Analytics

  • Ninghao Liu
  • Mengnan Du
  • Xia Hu

Representation learning models map data instances into a low-dimensional vector space, thus facilitating the deployment of subsequent models such as classification and clustering models, or the implementation of downstream applications such as recommendation and anomaly detection. However, the outcome of representation learning is difficult to be directly understood by users, since each dimension of the latent space may not have any specific meaning. Understanding representation learning could be beneficial to many applications. For example, in recommender systems, knowing why a user instance is mapped to a certain position in the latent space may unveil the user’s interests and profile. In this paper, we propose an interpretation framework to understand and describe how representation vectors distribute in the latent space. Specifically, we design a coding scheme to transform representation instances into spatial codes to indicate their locations in the latent space. Following that, a multimodal autoencoder is built for generating the description of a representation instance given its spatial codes. The coding scheme enables indication of position with different granularity. The incorporation of autoencoder makes the framework capable of dealing with different types of data. Several metrics are designed to evaluate interpretation results. Experiments under various application scenarios and different representation learning models are conducted to demonstrate the flexibility and effectiveness of the proposed framework.

CORALS: Who Are My Potential New Customers? Tapping into the Wisdom of Customers’ Decisions

  • Ruirui Li
  • Jyun-Yu Jiang
  • Chelsea J.-T. Ju
  • Wei Wang

Identifying and recommending potential new customers for local businesses are crucial to the survival and success of local businesses. A key component to identifying the right customers is to understand the decision-making process of choosing a business over the others. However, modeling this process is an extremely challenging task as a decision is influenced by multiple factors. These factors include but are not limited to an individual’s taste or preference, the location accessibility of a business, and the reputation of a business from social media. Most of the recommender systems lack the power to integrate multiple factors together and are hardly extensible to accommodate new incoming factors. In this paper, we introduce a unified framework, CORALS, which considers the personal preferences of different customers, the geographical influence, and the reputation of local businesses in the customer recommendation task. To evaluate the proposed model, we conduct a series of experiments to extensively compare with 12 state-of-the-art methods using two real-world datasets. The results demonstrate that CORALS outperforms all these baselines by a significant margin in most scenarios. In addition to identifying potential new customers, we also break down the analysis for different types of businesses to evaluate the impact of various factors that may affect customers’ decisions. This information, in return, provides a great resource for local businesses to adjust their advertising strategies and business services to attract more prospective customers.

Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia Articles

  • Marius Pasca
  • Travis Wolfe

A supervised method relies on simple, lightweight features in order to distinguish Wikipedia articles that are classes (Shield volcano) from other articles (Kilauea). The features are lexical or semantic in nature. Experimental results in multiple languages over multiple evaluation sets demonstrate the superiority of the proposed method over previous work.

ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text

  • Mohamed H. Gad-Elrab
  • Daria Stepanova
  • Jacopo Urbani
  • Gerhard Weikum

Fact-checking is a crucial task for accurately populating, updating and curating knowledge graphs. Manually validating candidate facts is time-consuming. Prior work on automating this task focuses on estimating truthfulness using numerical scores which are not human-interpretable. Others extract explicit mentions of the candidate fact in the text as an evidence for the candidate fact, which can be hard to directly spot. In our work, we introduce ExFaKT, a framework focused on generating human-comprehensible explanations for candidate facts. ExFaKT uses background knowledge encoded in the form of Horn clauses to rewrite the fact in question into a set of other easier-to-spot facts. The final output of our framework is a set of semantic traces for the candidate fact from both text and knowledge graphs. The experiments demonstrate that our rewritings significantly increase the recall of fact-spotting while preserving high precision. Moreover, we show that the explanations effectively help humans to perform fact-checking and can also be exploited for automating this task.

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

  • Wen Zhang
  • Bibek Paudel
  • Wei Zhang
  • Abraham Bernstein
  • Huajun Chen

Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions—bi-directional effects between entities and relations—- help select related information when predicting a new triple, but haven’t been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective—giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.

Knowledge Graph Embedding Based Question Answering

  • Xiao Huang
  • Jingyuan Zhang
  • Dingcheng Li
  • Ping Li

Question answering over knowledge graph (QA-KG) aims to use facts in the knowledge graph (KG) to answer natural language questions. It helps end users more efficiently and more easily access the substantial and valuable knowledge in the KG, without knowing its data structures. QA-KG is a nontrivial problem since capturing the semantic meaning of natural language is difficult for a machine. Meanwhile, many knowledge graph embedding methods have been proposed. The key idea is to represent each predicate/entity as a low-dimensional vector, such that the relation information in the KG could be preserved. The learned vectors could benefit various applications such as KG completion and recommender systems. In this paper, we explore to use them to handle the QA-KG problem. However, this remains a challenging task since a predicate could be expressed in different ways in natural language questions. Also, the ambiguity of entity names and partial names makes the number of possible answers large. To bridge the gap, we propose an effective Knowledge Embedding based Question Answering (KEQA) framework. We focus on answering the most common types of questions, i.e., simple questions, in which each question could be answered by the machine straightforwardly if its single head entity and single predicate are correctly identified. To answer a simple question, instead of inferring its head entity and predicate directly, KEQA targets at jointly recovering the question’s head entity, predicate, and tail entity representations in the KG embedding spaces. Based on a carefully-designed joint distance metric, the three learned vectors’ closest fact in the KG is returned as the answer. Experiments on a widely-adopted benchmark demonstrate that the proposed KEQA outperforms the state-of-the-art QA-KG methods.

Relevance Search over Schema-Rich Knowledge Graphs

  • Yu Gu
  • Tianshuo Zhou
  • Gong Cheng
  • Ziyang Li
  • Jeff Z. Pan
  • Yuzhong Qu

Relevance search over a knowledge graph (KG) has gained much research attention. Given a query entity in a KG, the problem is to find its most relevant entities. However, the relevance function is hidden and dynamic. Different users for different queries may consider relevance from different angles of semantics. The ambiguity in a query is more noticeable in the presence of thousands of types of entities and relations in a schema-rich KG, which has challenged the effectiveness and scalability of existing methods. To meet the challenge, our approach called RelSUE requests a user to provide a small number of answer entities as examples, and then automatically learns the most likely relevance function from these examples. Specifically, we assume the intent of a query can be characterized by a set of meta-paths at the schema level. RelSUE searches a KG for diversified significant meta-paths that best characterize the relevance of the user-provided examples to the query entity. It reduces the large search space of a schema-rich KG using distance and degree-based heuristics, and performs reasoning to deduplicate meta-paths that represent equivalent query-specific semantics. Finally, a linear model is learned to predict meta-path based relevance. Extensive experiments demonstrate that RelSUE outperforms several state-of-the-art methods.

A State Transition Model for Mobile Notifications via Survival Analysis

  • Yiping Yuan
  • Jing Zhang
  • Shaunak Chatterjee
  • Shipeng Yu
  • Romer Rosales

Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users’ complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.

Clustered Monotone Transforms for Rating Factorization

  • Gaurush Hiranandani
  • Raghav Somani
  • Oluwasanmi Koyejo
  • Sreangsu Acharyya

Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting in a poor fit and consequentially, poor recommendations. In this paper, we propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. Essentially, for recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user, for a cluster of users, or for all the users at once, forming the basis of three simple and efficient algorithms proposed in this paper, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Experimental results on two synthetic and seven real-world datasets show that CMTRF outperforms other state-of-the-art baselines.

Sparsemax and Relaxed Wasserstein for Topic Sparsity

  • Tianyi Lin
  • Zhiyue Hu
  • Xin Guo

Topic sparsity refers to the observation that individual documents usually focus on several salient topics instead of covering a wide variety of topics, and a real topic adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this topic sparsity is especially important for analyzing user-generated web content and social media, which are featured in the form of extremely short posts and discussions. As topic sparsity of individual documents in online social media increases, so does the difficulty of analyzing the online text sources using traditional methods. In this paper, we propose two novel neural models by providing sparse posterior distributions over topics based on the Gaussian sparsemax construction, enabling efficient training by stochastic backpropagation. We construct an inference network conditioned on the input data and infer the variational distribution with the relaxed Wasserstein (RW) divergence. Unlike existing works based on Gaussian softmax construction and Kullback-Leibler (KL) divergence, our approaches can identify latent topic sparsity with training stability, predictive performance, and topic coherence. Experiments on different genres of large text corpora have demonstrated the effectiveness of our models as they outperform both probabilistic and neural methods.

SESSION: Session 3: Recommendation and Temporal Trends

Session details: Session 3: Recommendation and Temporal Trends

  • Flora Salim RMIT University

RecWalk: Nearly Uncoupled Random Walks for Top-N Recommendation

  • Athanasios N. Nikolakopoulos
  • George Karypis

Random walks can provide a powerful tool for harvesting the rich network of interactions captured within item-based models for top-n recommendation. They can exploit indirect relations between the items, mitigate the effects of sparsity, ensure wider itemspace coverage, as well as increase the diversity of recommendation lists. Their potential however, is hindered by the tendency of the walks to rapidly concentrate towards the central nodes of the graph, thereby significantly restricting the range of K-step distributions that can be exploited for personalized recommendations. In this work we introduce RecWalk; a novel random walk-based method that leverages the spectral properties of nearly uncoupled Markov chains to provably lift this limitation and prolong the influence of users’ past preferences on the successive steps of the walk—allowing the walker to explore the underlying network more fruitfully. A comprehensive set of experiments on real-world datasets verify the theoretically predicted properties of the proposed approach and indicate that they are directly linked to significant improvements in top-n recommendation accuracy. They also highlight RecWalk’s potential in providing a framework for boosting the performance of item-based models. RecWalk achieves state-of-the-art top-n recommendation quality outperforming several competing approaches, including recently proposed methods that rely on deep neural networks.

Modeling Temporal Evidence from External Collections

  • Flávio Martins
  • João Magalhães
  • Jamie Callan

Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic’s temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.

Asynchronous Training of Word Embeddings for Large Text Corpora

  • Avishek Anand
  • Megha Khosla
  • Jaspreet Singh
  • Jan-Hendrik Zab
  • Zijian Zhang

Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires $1/10$ of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.

Social Attentional Memory Network: Modeling Aspect- and Friend-Level Differences in Recommendation

  • Chong Chen
  • Min Zhang
  • Yiqun Liu
  • Shaoping Ma

Social connections are known to be helpful for modeling users’ potential preferences and improving the performance of recommender systems. However, in social-aware recommendations, there are two issues which influence the inference of users’ preferences, and haven’t been well-studied in most existing methods: First, the preferences of a user may only partially match that of his friends in certain aspects, especially when considering a user with diverse interests. Second, for an individual, the influence strength of his friends might be different, as not all friends are equally helpful for modeling his preferences in the system. To address the above issues, in this paper, we propose a novel Social Attentional Memory Network (SAMN) for social-aware recommendation. Specifically, we first design an attention-based memory module to learn user-friend relation vectors, which can capture the varying aspect attentions that a user share with his different friends. Then we build a friend-level attention component to adaptively select informative friends for user modeling. The two components are fused together to mutually enhance each other and lead to a finer extended model. Experimental results on three publicly available datasets show that the proposed SAMN model consistently and significantly outperforms the state-of-the-art recommendation methods. Furthermore, qualitative studies have been made to explore what the proposed attention-based memory module and friend-level attention have learnt, which provide insights into the model’s learning process.

Uncovering Hidden Structure in Sequence Data via Threading Recurrent Models

  • Manzil Zaheer
  • Amr Ahmed
  • Yuan Wang
  • Daniel Silva
  • Marc Najork
  • Yuchen Wu
  • Shibani Sanan
  • Surojit Chatterjee

Long Short-Term Memory (LSTM) is one of the most powerful sequence models for user browsing history \citetan2016improved,korpusik2016recurrent or natural language text \citemikolov2010recurrent.Despite the strong performance, it has not gained popularity for user-facing applications, mainly owing to a large number of parameters and lack of interpretability. Recently \citetzaheer2017latent introduced latent LSTM Allocation (LLA) to address these problems by incorporating topic models with LSTM, where the topic model maps observed words in each sequence to topics that evolve using an LSTM model. In our experiments, we found the resulting model, although powerful and interpretable, to show shortcomings when applied to sequence data that exhibit multi-modes of behaviors with abrupt dynamic changes. To address this problem we introduce thLLA: a threading LLA model. thLLA has the ability to break each sequence into a set of segments and then model the dynamic in each segment using an LSTM mixture. In that way, thLLA can model abrupt changes in sequence dynamics and provides a better fit for sequence data while still being interpretable and requiring fewer parameters. In addition, thLLA uncovers hidden themes in the data via its dynamic mixture components. However, such generalization and interpretability come at a cost of complex dependence structure, for which inference would be extremely non-trivial. To remedy this, we present an efficient sampler based on particle MCMC method for inference that can draw from the joint posterior directly. Experimental results confirm the superiority of thLLA and the stability of the new inference algorithm on a variety of domains.

SESSION: Session 4: FATE & Privacy

Session details: Session 4: FATE & Privacy

  • Fernando Diaz Microsoft

Neural Based Statement Classification for Biased Language

  • Christoph Hube
  • Besnik Fetahu

Biased language commonly occurs around topics which are of controversial nature, thus, stirring disagreement between the different involved parties of a discussion. This is due to the fact that for language and its use, specifically, the understanding and use of phrases, the stances are cohesive within the particular groups. However, such cohesiveness does not hold across groups. In collaborative environments or environments where impartial language is desired (e.g. Wikipedia, news media), statements and the language therein should represent equally the involved parties and be neutrally phrased. Biased language is introduced through the presence of inflammatory words or phrases, or statements that may be incorrect or one-sided, thus violating such consensus. In this work, we focus on the specific case of phrasing bias, which may be introduced through specific inflammatory words or phrases in a statement. For this purpose, we propose an approach that relies on a recurrent neural networks in order to capture the inter-dependencies between words in a phrase that introduced bias. We perform a thorough experimental evaluation, where we show the advantages of a neural based approach over competitors that rely on word lexicons and other hand-crafted features in detecting biased language. We are able to distinguish biased statements with a precision of P=0.917, thus significantly outperforming baseline models with an improvement of over 30%. Finally, we release the largest corpus of statements annotated for biased language.

Enabling Privacy-Preserving Sharing of Genomic Data for GWASs in Decentralized Networks

  • Yanjun Zhang
  • Xin Zhao
  • Xue Li
  • Mingyang Zhong
  • Caitlin Curtis
  • Chen Chen

The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In this work, we propose a preventive approach for privacy-preserving sharing of genomic data in decentralized networks for Genome-wide association studies (GWASs), which have been widely used in discovering the association between genotypes and phenotypes. The key components of this work are: a decentralized secure network, with a privacy- preserving sharing protocol, and a gene fragmentation framework that is trainable in an end-to-end manner. Our experiments on real datasets show the effectiveness of our privacy-preserving approaches as well as significant improvements in efficiency when compared with recent, related algorithms.

Protecting User Privacy: An Approach for Untraceable Web Browsing History and Unambiguous User Profiles

  • Ghazaleh Beigi
  • Ruocheng Guo
  • Alexander Nou
  • Yanchao Zhang
  • Huan Liu

The overturning of the Internet Privacy Rules by the Federal Communications Commissions (FCC) in late March 2017 allows Internet Service Providers (ISPs) to collect, share and sell their customers’ Web browsing data without their consent. With third-party trackers embedded on Web pages, this new rule has put user privacy under more risk. The need arises for users on their own to protect their Web browsing history from any potential adversaries. Although some available solutions such as Tor, VPN, and HTTPS can help users conceal their online activities, their use can also significantly hamper personalized online services, i.e., degraded utility. In this paper, we design an effective Web browsing history anonymization scheme, PBooster, aiming to protect users’ privacy while retaining the utility of their Web browsing history. The proposed model pollutes users’ Web browsing history by automatically inferring how many and what links should be added to the history while addressing the utility-privacy trade-off challenge. We conduct experiments to validate the quality of the manipulated Web browsing history and examine the robustness of the proposed approach for user privacy protection.

Spiral of Silence in Recommender Systems

  • Dugang Liu
  • Chen Lin
  • Zhilin Zhang
  • Yanghua Xiao
  • Hanghang Tong

It has been established that, ratings are missing not at random in recommender systems. However, little research has been done to reveal how the ratings are missing. In this paper we present one possible explanation of the missing not at random phenomenon. We verify that, using a variety of different real-life datasets, there is a spiral process for a silent minority in recommender systems where (1) people whose opinions fall into the minority are less likely to give ratings than majority opinion holders; (2) as the majority opinion becomes more dominant, the rating possibility of a majority opinion holder is intensifying but the rating possibility of a minority opinion holder is shrinking; (3) only hardcore users remain to rate for minority opinions when the spiral achieves its steady state. Our empirical findings are beneficial for future recommendation models. To demonstrate the impact of our empirical findings, we present a probabilistic model that mimics the generation process of spiral of silence. We experimentally show that, the presented model offers more accurate recommendations, compared with state-of-the-art recommendation models.

Fighting Fire with Fire: Using Antidote Data to Improve Polarization and Fairness of Recommender Systems

  • Bashir Rastegarpanah
  • Krishna P. Gummadi
  • Mark Crovella

The increasing role of recommender systems in many aspects of society makes it essential to consider how such systems may impact social good. Various modifications to recommendation algorithms have been proposed to improve their performance for specific socially relevant measures. However, previous proposals are often not easily adapted to different measures, and they generally require the ability to modify either existing system inputs, the system’s algorithm, or the system’s outputs. As an alternative, in this paper we introduce the idea of improving the social desirability of recommender system outputs by adding more data to the input, an approach we view as as providing ‘antidote’ data to the system. We formalize the antidote data problem, and develop optimization-based solutions. We take as our model system the matrix factorization approach to recommendation, and we propose a set of measures to capture the polarization or fairness of recommendations. We then show how to generate antidote data for each measure, pointing out a number of computational efficiencies, and discuss the impact on overall system accuracy. Our experiments show that a modest budget for antidote data can lead to significant improvements in the polarization or fairness of recommendations.

Copyrights © 2018 All Rights Reserved - SIGKDD
ACM Code of Conduct