A machine learning approach using conditional normalizing flow to address extreme class imbalance problems in personal health records.

Yeongmin Kim Wongyung Choi Woojeong Choi Grace Ko Seonggyun Han Hwan-Cheol Kim Dokyoon Kim Dong-Gi Lee Dong Wook Shin Younghee Lee

BioData Min

College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul National University, Seoul, Republic of Korea.

Published: May 2024

Supervised machine learning models are used to predict diseases but face challenges with class imbalance in training, prompting the use of a conditional normalizing flow model for better predictions.
This study utilized health records from 706 South Korean individuals, focusing on six chronic diseases, particularly evaluating the model's performance in classifying diabetes which had a low occurrence rate (about 2%).
Results showed that the conditional normalizing flow model outperformed traditional supervised models, achieving better metrics for classifying diabetes and other chronic diseases, indicating its effectiveness in addressing class imbalance in medical data.

Background: Supervised machine learning models have been widely used to predict and get insight into diseases by classifying patients based on personal health records. However, a class imbalance is an obstacle that disrupts the training of the models. In this study, we aimed to address class imbalance with a conditional normalizing flow model, one of the deep-learning-based semi-supervised models for anomaly detection. It is the first introduction of the normalizing flow algorithm for tabular biomedical data.

Methods: We collected personal health records from South Korean citizens (n = 706), featuring genetic data obtained from direct-to-customer service (microarray chip), medical health check-ups, and lifestyle log data. Based on the health check-up data, six chronic diseases were labeled (obesity, diabetes, hypertriglyceridemia, dyslipidemia, liver dysfunction, and hypertension). After preprocessing, supervised classification models and semi-supervised anomaly detection models, including conditional normalizing flow, were evaluated for the classification of diabetes, which had extreme target imbalance (about 2%), based on AUROC and AUPRC. In addition, we evaluated their performance under the assumption of insufficient collection for patients with other chronic diseases by undersampling disease-affected samples.

Results: While LightGBM (the best-performing model among supervised classification models) showed AUPRC 0.16 and AUROC 0.82, conditional normalizing flow achieved AUPRC 0.34 and AUROC 0.83 during fifty evaluations of the classification of diabetes, whose base rate was very low, at 0.02. Moreover, conditional normalizing flow performed better than the supervised model under a few disease-affected data numbers for the other five chronic diseases - obesity, hypertriglyceridemia, dyslipidemia, liver dysfunction, and hypertension. For example, while LightGBM performed AUPRC 0.20 and AUROC 0.75, conditional normalizing flow showed AUPRC 0.30 and AUROC 0.74 when predicting obesity, while undersampling disease-affected samples (positive undersampling) lowered the base rate to 0.02.

Conclusions: Our research suggests the utility of conditional normalizing flow, particularly when the available cases are limited, for predicting chronic diseases using personal health records. This approach offers an effective solution to deal with sparse data and extreme class imbalances commonly encountered in the biomedical context.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11127363	PMC
http://dx.doi.org/10.1186/s13040-024-00366-0	DOI Listing

Publication Analysis

Top Keywords

normalizing flow

conditional normalizing

personal health

health records

chronic diseases

class imbalance

machine learning

normalizing

flow

extreme class

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!