Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective.

Comput Methods Programs Biomed

Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK.

Published: June 2022

Background And Objective: Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis.

Methods: In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem.

Results: Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively.

Conclusion: The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2022.106773DOI Listing

Publication Analysis

Top Keywords

machine learning
28
diabetes
12
diabetes mellitus
12
prediction diagnosis
12
learning framework
12
diabetes prediction
12
mellitus prediction
8
data preprocessing
8
machine
8
diagnosis diabetes
8

Similar Publications

Unveiling new therapeutic horizons in rheumatoid arthritis: an In-depth exploration of circular RNAs derived from plasma exosomes.

J Orthop Surg Res

January 2025

Department of Rheumatology and Immunology, Affiliated Hospital of Yangzhou University, Yangzhou University, No. 368 Hanjiang Middle Road, Yangzhou, Jiangsu, 225000, China.

Rheumatoid arthritis (RA), a chronic inflammatory joint disease causing permanent disability, involves exosomes, nanosized mammalian extracellular particles. Circular RNA (circRNA) serves as a biomarker in RA blood samples. This research screened differentially expressed circRNAs in RA patient plasma exosomes for novel diagnostic biomarkers.

View Article and Find Full Text PDF

Detection of early relapse in multiple myeloma patients.

Cell Div

January 2025

Babak Myeloma Group, Department of Pathophysiology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.

Background: Multiple myeloma (MM) represents the second most common hematological malignancy characterized by the infiltration of the bone marrow by plasma cells that produce monoclonal immunoglobulin. While the quality and length of life of MM patients have significantly increased, MM remains a hard-to-treat disease; almost all patients relapse. As MM is highly heterogenous, patients relapse at different times.

View Article and Find Full Text PDF

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples.

View Article and Find Full Text PDF

Background: Osteoporosis (OP), often termed the "silent epidemic," poses a substantial public health burden. Emerging insights into the molecular functions of FBXW4 have spurred interest in its potential roles across various diseases.

Methods: This study explored FBXW4 by integrating DEGs from GEO datasets GSE2208, GSE7158, GSE56815, and GSE35956 with immune-related gene compilations from the ImmPort repository.

View Article and Find Full Text PDF

Background: Hypertension (HTN) is a global public health concern and a major risk factor for cardiovascular disease (CVD) and mortality. Insulin resistance (IR) plays a crucial role in HTN-related metabolic dysfunction, but its assessment remains challenging. The triglyceride-glucose (TyG) index and its derivatives (TyG-BMI, TyG-WC, and TyG-WHtR) have emerged as reliable IR markers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!