KDD Topics

Data Reliability and Truthfulness

Curated by: Jiawei Han and Jing Gao

The data reliability issue poses great difficulty to many decision making tasks when the data contains inconsistent, inaccurate, or even false information that could mislead the decisions and eventually result in invaluable losses. Unfortunately, we cannot expect real-world data to be clean and accurate, instead, data inconsistency, ambiguity and uncertainty widely exist. Such ubiquitous veracity problems motivate numerous efforts towards improving the information quality, trustworthiness and reliability. The efforts are taken from different perspectives to identify reliable information sources and trustworthy claims. Some popular subtopics are listed below: (1) Truth discovery is an emerging topic that attracts much attention. The goal is to discover truths from multiple conflicting information sources without supervision. The basic idea is to estimate both source reliability and claim trustworthiness simultaneously by examining the relationship between sources and claims. (2) Many efforts have been devoted to detect spams, rumors, or other types of untruthful information in the online world. Supervised approaches are typically adopted in which features are extracted to capture distinctions between rumors (spams) and facts. Trust between sources (users) is also an important factor in evaluating information trustworthienss, and many graph-based approaches are developed to assess source trustworthiness. (3) Anomaly detection and denoising also directly contribute to solving the data reliability issue. Anomaly detection is to detect anomalous data points that deviate significantly from the rest of the data. Denoising is usually conducted to reduce the level of noise in the data and repair the low-quality data by exploiting some underlying distributions or patterns in the data.

Surveys:

Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, Jiawei Han. A

Survey on Truth Discovery. SIGKDD Explorations Newsletter, 17(2): 1-16, 2015.

Xian Li, Xin Luna Dong, K. B. Lyons, Weiyi Meng, Divesh Srivastava. Truth finding on

the deep web: Is the problem solved? PVLDB, 6(2):97–108, 2012.

Manish Gupta, Jiawei Han. Heterogeneous Network-Based Trust Analysis: A Survey.

SIGKDD Explorations Newsletter, 13(1):60-77, 2011.

Meng Jiang, Peng Cui, Christos Faloutsos. Suspicious Behavior Detection: Current

Trends and Future Directions. Special Issue on Online Behavioral Analysis and

Modeling, IEEE Intelligent Systems Magazine, 2016.

Jiliang Tang, Huan Liu. Trust in Social Media, Morgan Claypool & Publishers, 2015.

Data and codes:

http://www.cse.buffalo.edu/~jing/software.htm

http://lunadong.com/fusionDataSets.htm

http://cogcomp.cs.illinois.edu/page/resource_view/16

http://www.jiliang.xyz/trust.html

Related KDD2016 Papers

Title & Authors
Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach Author(s): Houping Xiao*, SUNY Buffalo; Jing Gao, ; Qi Li, SUNY Buffalo; Fenglong Ma, SUNY Buffalo; Lu Su, SUNY Buffalo; Yunlong Feng, KU Leuven; Aidong Zhang,
From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Ap Author(s): Mengting Wan*, UC San Diego; Xiangyu Chen, University of Illinois, Urbana-Champaign; Lance Kaplan, U.S. Army Research Laboratory; Jiawei Han, University of Illinois at Urbana-Champaign; Jing Gao, ; Bo Zhao, LinkedIn

KDD Topics

Data Reliability and Truthfulness

Related KDD2016 Papers

Comments