An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance.

Bioengineering (Basel)

SP Jain Global School of Management, Academic City, Dubai P.O. Box 502345, United Arab Emirates.

Published: January 2025

Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance-k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.

Download full-text PDF

Source
http://dx.doi.org/10.3390/bioengineering12010035DOI Listing

Publication Analysis

Top Keywords

class imbalance
16
cross validation
12
effective methodology
8
proposed methodology
8
machine learning
8
learning models
8
methodology
6
class
5
diabetes
4
methodology diabetes
4

Similar Publications

Challenges and compromises: Predicting unbound antibody structures with deep learning.

Curr Opin Struct Biol

January 2025

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom.

Therapeutic antibodies are manufactured, stored and administered in the free state; this makes understanding the unbound form key to designing and improving development pipelines. Prediction of unbound antibodies is challenging, specifically modelling of the CDRH3 loop, where inaccuracies are potentially worse due to a bias in structural data towards antibody-antigen complexes. This class imbalance provides a challenge for deep learning models trained on this data, potentially limiting generalisation to unbound forms.

View Article and Find Full Text PDF

In the Imbalanced Multivariate Time Series Classification (ImMTSC) task, minority-class instances typically correspond to critical events, such as system faults in power grids or abnormal health occurrences in medical monitoring. Despite being rare and random, these events are highly significant. The dynamic spatial-temporal relationships between minority-class instances and other instances make them more prone to interference from neighboring instances during classification.

View Article and Find Full Text PDF

Background And Objectives: Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR.

View Article and Find Full Text PDF

Machine Learning-Based Alzheimer's Disease Stage Diagnosis Utilizing Blood Gene Expression and Clinical Data: A Comparative Investigation.

Diagnostics (Basel)

January 2025

Department of Computer Science and Engineering, Faculty of Engineering and Technology, Technology Campus (Peenya Campus), Ramaiah University of Applied Sciences, Bengaluru 560058, India.

This study presents a comparative analysis of the multistage diagnosis of Alzheimer's disease (AD), including mild cognitive impairment (MCI), utilizing two distinct types of biomarkers: blood gene expression and clinical biomarker samples. Both of these samples, obtained from participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI), were independently analyzed utilizing machine learning (ML)-based multiclassifiers. This study applied novel machine learning-based data augmentation techniques to gene expression profile data that are high-dimensional, low-sample-size (HDLSS) and inherently highly imbalanced.

View Article and Find Full Text PDF

Comparison of Resampling Methods and Radiomic Machine Learning Classifiers for Predicting Bone Quality Using Dual-Energy X-Ray Absorptiometry.

Diagnostics (Basel)

January 2025

Instituto de Investigación en Tecnología Informática Avanzada, Universidad Nacional del Centro de la Provincia de Buenos Aires, Tandil 7000, Argentina.

: This study presents a novel approach, based on a combination of radiomic feature extraction, data resampling techniques, and machine learning algorithms, for the detection of degraded bone structures in Dual X-ray Absorptiometry (DXA) images. This comprehensive approach, which addresses the critical aspects of the problem, distinguishes this work from previous studies, improving the performance achieved by the most similar studies. The primary aim is to provide clinicians with an accessible tool for quality bone assessment, which is currently limited.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!