Comparative study of imputation strategies to improve the sarcopenia prediction task.

Digit Health

Department of Exercise Rehabilitation & Welfare, Gachon University, Incheon, Republic of Korea.

Published: January 2025

Objective: Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study conducts a comparative analysis of three advanced methods-multiple imputation by chained equations (MICE), support vector regression, and K-nearest neighbors (KNN)-to address data completeness issues in sarcopenia research.

Methods: Following imputation, we utilized machine learning models, including logistic regression, gradient boosting, support vector machine, and random forest, to classify sarcopenia. The methodology encompassed rigorous data preprocessing, normalization, and the synthetic minority oversampling technique to address class imbalance and ensure unbiased model performance.

Results: The results revealed substantial variations in model accuracy based on the imputation method employed. The gradient boosting model consistently exhibited superior performance across all imputation strategies, demonstrating its robustness with imputed datasets. Additionally, KNN and MICE emerged as effective imputation techniques, preserving the original data distribution and enabling more accurate classification outcomes.

Conclusion: This study underscores the pivotal role of imputation methods in maintaining data integrity and enhancing predictive accuracy in sarcopenia research. The gradient boosting model's reliability across all strategies highlights its potential as a robust classifier, while the suitability of KNN and MICE for preserving data distribution supports their application in similar research contexts. These findings contribute to more reliable and valid insights in sarcopenia studies, ultimately supporting improved clinical outcomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748086PMC
http://dx.doi.org/10.1177/20552076241301960DOI Listing

Publication Analysis

Top Keywords

gradient boosting
12
imputation
8
imputation strategies
8
effective imputation
8
imputation techniques
8
support vector
8
knn mice
8
data distribution
8
sarcopenia
6
data
6

Similar Publications

Background: Retail involves directly delivering goods and services to end consumers. Natural disasters and epidemics/pandemics have significant potential to disrupt supply chains, leading to shortages, forecasting errors, price increases, and substantial financial strains on retailers. The COVID-19 pandemic highlighted the need for retail sectors to prepare for crisis impacts on sales forecasts by regularly assessing and adjusting sales volumes, consumer behavior, and forecasting models to adapt to changing conditions.

View Article and Find Full Text PDF

A bird's-eye view of the biological mechanism and machine learning prediction approaches for cell-penetrating peptides.

Front Artif Intell

January 2025

Department of Genetic Engineering, Computational Biology Lab, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Chennai, India.

Cell-penetrating peptides (CPPs) are highly effective at passing through eukaryotic membranes with various cargo molecules, like drugs, proteins, nucleic acids, and nanoparticles, without causing significant harm. Creating drug delivery systems with CPP is associated with cancer, genetic disorders, and diabetes due to their unique chemical properties. Wet lab experiments in drug discovery methodologies are time-consuming and expensive.

View Article and Find Full Text PDF

Comparative study of imputation strategies to improve the sarcopenia prediction task.

Digit Health

January 2025

Department of Exercise Rehabilitation & Welfare, Gachon University, Incheon, Republic of Korea.

Objective: Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study conducts a comparative analysis of three advanced methods-multiple imputation by chained equations (MICE), support vector regression, and K-nearest neighbors (KNN)-to address data completeness issues in sarcopenia research.

View Article and Find Full Text PDF

Objective: This study aims to evaluate key factors influencing the short-term and long-term prognosis of stroke patients, with a particular focus on variables such as body weight, hemoglobin, electrolytes, kidney function, organ function scores, and comorbidities. Stroke poses a significant global health burden, and understanding its prognostic factors is crucial for clinical management.

Methods: This is a retrospective cohort study based on data from the MIMIC-IV database, including stroke patients from 2010 to 2020.

View Article and Find Full Text PDF

Deep learning captures the effect of epistasis in multifactorial diseases.

Front Med (Lausanne)

January 2025

International Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, Russia.

Background: Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer's disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!