Identifying rare but significant healthcare events in massive unstructured datasets has become a common task in healthcare data analytics. However, imbalanced class distribution in many practical datasets greatly hampers the detection of rare events, as most classification methods implicitly assume an equal occurrence of classes and are designed to maximize the overall classification accuracy. In this study, we develop a framework for learning healthcare data with imbalanced distribution via incorporating different rebalancing strategies. The evaluation results showed that the developed framework can significantly improve the detection accuracy of medical incidents due to look-alike sound-alike (LASA) mix-ups. Specifically, logistic regression combined with the synthetic minority oversampling technique (SMOTE) produces the best detection results, with a significant 45.3% increase in recall (recall = 75.7%) compared with pure logistic regression (recall = 52.1%).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5987310PMC
http://dx.doi.org/10.1155/2018/6275435DOI Listing

Publication Analysis

Top Keywords

healthcare data
12
look-alike sound-alike
8
logistic regression
8
framework rebalancing
4
rebalancing imbalanced
4
healthcare
4
imbalanced healthcare
4
data rare
4
rare events'
4
events' classification
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!