Home / Topics

Semi-Supervised Learning

Curated by: Jerry Xiaojin Zhu

Semi-supervised learning uses both labeled and unlabeled data to improve supervisedlearning. The goal is to learn a predictor that predicts future test data better than the predictor learned from the labeled training data alone. Semi-supervised learning is motivated by its practical value in learning faster, better, and cheaper. In many real world applications, it is relatively easy to acquire a large amount of unlabeled data x. For example, documents can be crawled from the Web, images can be obtained from surveillance cameras, and speech can be collected from broadcast. However, their corresponding labels y for the prediction task, such as sentiment orientation, intrusion detection, and phonetic transcript, often requires slow human annotation and expensive laboratory experiments. This labeling bottleneck results in a scarce of labeled data and a surplus of unlabeled data. Therefore, being able to utilize the surplus unlabeled data is desirable. Common semi-supervised learning methods include generative models, semi-supervised support vector machines, graph Laplacian based methods, co-training, and multiview learning. These methods make different assumptions on the link between the unlabeled data distribution and the classification function. Such assumptions are equivalent to prior domain knowledge, and the success of semi-supervised learning depends to a large degree on the validity of the assumptions.


O. Chapelle and B. Sch{\"o}lkopf and A. Zien. Semi-Supervised Learning. MIT Press, 2006.

A. Subramanya and P. Talukdar. Graph-Based Semi-Supervised Learning. Morgan & Claypool Publishers, 2014.

X. Zhu and A. B. Goldberg. Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009.

Related KDD2016 Papers

Title & Authors
Smart Reply: Automated Response Suggestion for Email
Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala
Partial Label Learning via Feature-Aware Disambiguation
Author(s): Min-Ling Zhang*, Southeast University; Binbin Zhou, Southeast University; Xu-Ying Liu, Southeast University
A Multi-Task Learning Formulation for Survival Analysis
Author(s): Yan Li*, Wayne State University; Jie Wang, University of Michigan; Jieping Ye, University of Michigan at Ann Arbor; Chandan Reddy, Wayne State University
Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems
Author(s): Igor Melnyk*, University of Minnesota; Arindam Banerjee, University of Minnesota; Bryan Matthews, Nasa Ames Research Center; Nikunj Oza, Nasa Ames Research Center
Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilari
Author(s): Ting Kai Ming*, Federation University; YE ZHU, Monash University; Mark Carman, Monash University; Yue Zhu, Nanjing University
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Author(s): Xiang Ren*, UIUC; Wenqi He, UIUC; Meng Qu, UIUC; Heng Ji, PRI; Clare Voss, ARL; Jiawei Han, University of Illinois at Urbana-Champaign
Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test
Author(s): Denis Dos Reis*, Universidade de São Paulo; Gustavo Batista, Universidade de Sao Paulo at Sao Carlos; Peter Flach, University of Bristol; Stan Matwin, Dalhousie University
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
Author(s): Bryan Hooi*, Carnegie Mellon University; Hyun Ah Song, Carnegie Mellon University; Alex Beutel, Carnegie Mellon University; Neil Shah, Carnegie Mellon University; Kijung Shin, Carnegie Mellon University; Christos Faloutsos, Carnegie Mellon University
Goal-Directed Inductive Matrix Completion
Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas
Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning
Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech