The analysis of large-scale Electrical Medical Records (EMRs) has the potential to develop and optimize clinical treatment regimens. A treatment regimen usually includes a series of doctor orders containing rich temporal and heterogeneous information. However, in many existing studies, a doctor order is simplified as an event code and a treatment record is simplified as a code sequence. Thus, the information inherent in doctor orders is not fully used for in-depth analysis. In this paper, we aim at exploiting the rich information in doctor orders and developing data-driven approaches for improving clinical treatments. To this end, we first propose a novel method to measure the similarities between treatment records with consideration of sequential and multifaceted information in doctor orders. Then, we propose an efficient density-based clustering algorithm to summarize large-scale treatment records, and extract a semantic representation of each treatment cluster. Finally, we develop a unified frame-work to evaluate the discovered treatment regimens, and find the most effective treatment regimen for new patients. In the empirical study, we validate our methods with EMRs of 27,678 patients from 14 hospitals. The results show that: 1) Our method can successfully extract typical treatment regimens from large-scale treatment records. The extracted treatment regimens are intuitive and provide managerial implications for treatment regimen design and optimization. 2) By recommending the most effective treatment regimens, the total cure rate in our data improves from 19.89% to 21.28%, and the effective rate increases up to 98.29%.

Filed under: Dimensionality Reduction | Clustering | Mining Rich Data Types | Time Series and Stream Mining