Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( = 11,081).

Front Big Data

Erasmus University, Rotterdam, Netherlands.

Published: April 2020

Machine learning is increasingly being applied in healthcare, particularly for diagnosing mental health conditions, such as depression, which typically relies on time-consuming standardized interviews.
The research aimed to utilize a dataset of 11,081 Dutch citizens, highlighting the challenge of class imbalance with only 570 reported depression cases.
Different resampling techniques were used to address this imbalance, resulting in high performance (over 0.90) for the Extreme Gradient Boosting (XGBoost) model in classifying cases of mental illness accurately.

Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardized patient interviews with defined set of questions and scales which is a time consuming and costly process. Our objective was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases. In this research paper, we aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in our study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression. We used different resampling strategies to address the class imbalance problem. We created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, we applied machine learning algorithm "Extreme Gradient Boosting" (XGBoost) on each sample to classify the mental illness cases from healthy cases. The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931945	PMC
http://dx.doi.org/10.3389/fdata.2020.00015	DOI Listing

Publication Analysis

Top Keywords

machine learning

learning model

diagnosis depression

cases

depression cases

healthy cases

class imbalance

case depression

over-under sampling

depression

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!