Background And Objective: In the medical field, data techniques for prediction and finding patterns of prevalent diseases are of increasing interest. Classification is one of the methods used to provide insight into predicting the future onset of type 2 diabetes of those at high risk of progression from pre-diabetes to diabetes. When applying classification techniques to real-world datasets, imbalanced class distribution has been one of the most significant limitations that leads to patients' misclassification. In this paper, we propose a novel balancing method to improve the prediction performance of type 2 diabetes mellitus in imbalanced electronic medical records (EMR).
Methods: A novel undersampling method is proposed by utilizing a fixed partitioning distribution scheme in a regular grid. The proposed approach retains valuable information when balancing methods are applied to datasets.
Results: The best AUC of 80% compared to other classifiers was obtained from the logistic regression (LR) classifier for EMR by applying our proposed undersampling method to balance the data. The new method improved the performance of the LR classifier compared to existing undersampling methods used in the balancing stage.
Conclusion: The results demonstrate the effectiveness and high performance of the proposed method for predicting diabetes in a Canadian imbalanced dataset. Our methodology can be used in other areas to overcome the limitations of imbalanced class distributions.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.cca.2021.08.023 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!