Diabetes causes an increase in the level of blood sugar, which leads to damage to various parts of the human body. Diabetes data are used not only for providing a deeper understanding of the treatment mechanisms but also for predicting the probability that one might become sick. This paper proposes a novel methodology to perform classification in the case of heavy class imbalance, as observed in the PIMA diabetes dataset. The proposed methodology uses two novel steps, namely resampling and random shuffling prior to defining the classification model. The methodology is tested with two versions of cross validation that are appropriate in cases of class imbalance-k-fold cross validation and stratified k-fold cross validation. Our findings suggest that when having imbalanced data, shuffling the data randomly prior to a train/test split can help improve estimation metrics. Our methodology can outperform existing machine learning algorithms and complex deep learning models. Applying our proposed methodology is a simple and fast way to predict labels with class imbalance. It does not require additional techniques to balance classes. It does not involve preselecting important variables, which saves time and makes the model easy for analysis. This makes it an effective methodology for initial and further modeling of data with class imbalance. Moreover, our methodologies show how to increase the effectiveness of the machine learning models based on the standard approaches and make them more reliable.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3390/bioengineering12010035 | DOI Listing |
Curr Opin Struct Biol
January 2025
Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, United Kingdom.
Therapeutic antibodies are manufactured, stored and administered in the free state; this makes understanding the unbound form key to designing and improving development pipelines. Prediction of unbound antibodies is challenging, specifically modelling of the CDRH3 loop, where inaccuracies are potentially worse due to a bias in structural data towards antibody-antigen complexes. This class imbalance provides a challenge for deep learning models trained on this data, potentially limiting generalisation to unbound forms.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430070, Hubei, China.
In the Imbalanced Multivariate Time Series Classification (ImMTSC) task, minority-class instances typically correspond to critical events, such as system faults in power grids or abnormal health occurrences in medical monitoring. Despite being rare and random, these events are highly significant. The dynamic spatial-temporal relationships between minority-class instances and other instances make them more prone to interference from neighboring instances during classification.
View Article and Find Full Text PDFComput Methods Programs Biomed
January 2025
Regional Institute of Ophthalmology, Indira Gandhi Institute of Medical Sciences, Patna, 800025, Bihar, India.
Background And Objectives: Hypertensive Retinopathy (HR) is a retinal manifestation resulting from persistently elevated blood pressure. Severity grading of HR is essential for patient risk stratification, effective management, progression monitoring, timely intervention, and minimizing the risk of vision impairment. Computer-aided diagnosis and artificial intelligence (AI) systems play vital roles in the diagnosis and grading of HR.
View Article and Find Full Text PDFDiagnostics (Basel)
January 2025
Department of Computer Science and Engineering, Faculty of Engineering and Technology, Technology Campus (Peenya Campus), Ramaiah University of Applied Sciences, Bengaluru 560058, India.
This study presents a comparative analysis of the multistage diagnosis of Alzheimer's disease (AD), including mild cognitive impairment (MCI), utilizing two distinct types of biomarkers: blood gene expression and clinical biomarker samples. Both of these samples, obtained from participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI), were independently analyzed utilizing machine learning (ML)-based multiclassifiers. This study applied novel machine learning-based data augmentation techniques to gene expression profile data that are high-dimensional, low-sample-size (HDLSS) and inherently highly imbalanced.
View Article and Find Full Text PDFDiagnostics (Basel)
January 2025
Instituto de Investigación en Tecnología Informática Avanzada, Universidad Nacional del Centro de la Provincia de Buenos Aires, Tandil 7000, Argentina.
: This study presents a novel approach, based on a combination of radiomic feature extraction, data resampling techniques, and machine learning algorithms, for the detection of degraded bone structures in Dual X-ray Absorptiometry (DXA) images. This comprehensive approach, which addresses the critical aspects of the problem, distinguishes this work from previous studies, improving the performance achieved by the most similar studies. The primary aim is to provide clinicians with an accessible tool for quality bone assessment, which is currently limited.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!