Multi-Label Inference for Crowdsourcing
Jing Zhang (Nanjing University of Science and Technology); Xindong Wu (University of Louisiana at Lafayette)
When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the quality of labels, one task is independently completed by a group of heterogeneous crowdsourced workers. Then, the true values of the multiple labels of each task are inferred from these repeated noisy labels. In this paper, we propose a novel probabilistic method, which includes a multi-class multi-label dependency (MCMLD) model, to address this problem. The proposed method assumes that the label-correlation exists in both unknown true labels and noisy crowdsourced labels. Thus, it introduces a mixture of multiple independently multinoulli distributions to capture the correlation among the labels. Finally, the unknown true values of the multiple labels of each task, together with a set of confusion matrices modeling the reliability of the workers, can be jointly inferred through an EM algorithm. Experiments with three simulated typical crowdsourcing scenarios and a real-world dataset consistently show that our proposed MCMLD method significantly outperforms several competitive alternatives. Furthermore, if the labels are strongly correlated, the advantage of MCMLD will be more remarkable.