Kaiping Zheng (National University of Singapore);Jinyang Gao (National University of Singapore);Kee Yuan Ngiam (National University Health System);Beng Chin Ooi (National University of Singapore);Wei Luen James Yip (National University Health System)
Electronic Medical Records (EMR) are the most fundamental resources used in healthcare data analytics. Since people visit hospital more frequently when they feel sick and doctors prescribe lab examinations when they feel necessary, we argue that there could be a strong bias in EMR observations compared with the hidden conditions of patients. Directly using such EMR for analytic tasks without considering the bias may lead to misinterpretation. To this end, we propose a general method to resolve the bias by transforming EMR to regular patient hidden condition series using a Hidden Markov Model (HMM) variant. Compared with the biased EMR series with irregular time stamps, the unbiased regular time series is much easier to be processed by most analytic models and yields better results. Extensive experimental results demonstrate that our bias resolving approach imputes missing values more accurately than baselines and improves the performance of the state-of-the-art methods on typical medical data analytics.