Unsupervised Discovery of Drug Side-Effects From Heterogeneous Data Sources

Fenglong Ma (SUNY Buffalo);Chuishi Meng (SUNY Buffalo);Houping Xiao (SUNY Buffalo);Qi Li (SUNY Buffalo);Jing Gao (SUNY Buffalo);Lu Su (SUNY Buffalo);Aidong Zhang (SUNY Buffalo)

Abstract

Drug side-effects have become a worldwide public health concern, which are the fourth leading cause of death in the United States. Pharmaceutical industry has paid tremendous efforts to identify drug side-effects during the drug development. However, it is impossible and impractical to identify all of them. Fortunately, drug side-effects can also be reported on heterogeneous data sources, such as FDA Adverse Event Reporting System and various online communities. However, existing supervised and semi-supervised approaches are not practical as annotating labels are expensive in the medical field. In this paper, we propose a novel and effective unsupervised model Sifter to automatically discover drug side-effects. Sifter enhances the estimation on drug side-effects by learning from various online platforms and measuring platform-level and user-level quality simultaneously. In this way, Sifter demonstrates better performance compared with existing approaches in terms of correctly identifying drug side-effects. Experimental results on five real-world datasets show that Sifter can significantly improve the performance of identifying side-effects compared with the state-of-the-art approaches.