Active Deep Learning To Tune Down the Noise in Labels
Karan Samel (Astound); Xu Miao (Astound)
The great success of supervised learning has initiated a paradigm shift from building a deterministic software system to a probabilistic artificial intelligent system throughout the industry. The historical records in enterprise domains can potentially bootstrap the traditional business into the modern data-driven approach almost everywhere. The introduction of the Deep Neural Networks (DNNs) significantly reduces the efforts of feature engineering so that supervised learning becomes even more automated. The last bottleneck is to ensure the data quality, particularly the label quality, because the performance of supervised learning is bounded by the errors present in labels. In this paper, we present a new Active Deep Denoising (ADD) approach that first builds a DNN noise model, and then adopts an active learning algorithm to identify the optimal denoising function. We prove that under the low noise condition, we only need to query the oracle with log n examples where n is the total number in the data. We apply ADD on one enterprise application and show that it can effectively reduce 1/3 of the prediction error with only 0.1% of examples verified by the oracle.