Many datasets in statistical analyses contain missing values. As omitting observations containing missing entries may lead to information loss or greatly reduce the sample size, imputation is usually preferable. However, imputation can also introduce bias and impact the quality and validity of subsequent analysis. Focusing on binary classification problems, we analyzed how missing value imputation under MCAR as well as MAR missingness with different missing patterns affects the predictive performance of subsequent classification. To this end, we compared imputation methods such as several MICE variants, missForest, Hot Deck as well as mean imputation with regard to the classification performance achieved with commonly used classifiers such as Random Forest, Extreme Gradient Boosting, Support Vector Machine and regularized logistic regression. Our simulation results showed that Random Forest based imputation (i.e., MICE Random Forest and missForest) performed particularly well in most scenarios studied. In addition to these two methods, simple mean imputation also proved to be useful, especially when many features (covariates) contained missing values.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048089PMC
http://dx.doi.org/10.3390/e25030521DOI Listing

Publication Analysis

Top Keywords

random forest
12
classification performance
8
missing values
8
imputation
7
missing
6
analyzing imputation
4
classification
4
imputation classification
4
performance mcar
4
mcar mar
4

Similar Publications

Assessing myocardial viability is crucial for managing ischemic heart disease. While late gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) is the gold standard for viability evaluation, it has limitations, including contraindications in patients with renal dysfunction and lengthy scan times. This study investigates the potential of non-contrast CMR techniques-feature tracking strain analysis and T1/T2 mapping-combined with machine learning (ML) models, as an alternative to LGE-CMR for myocardial viability assessment.

View Article and Find Full Text PDF

Active transportation, such as cycling, improves mobility and general health. However, statistics reveal that in low- and middle-income countries, male and female cycling participation rates differ significantly. Existing literature highlights that women's willingness to use bicycles is significantly influenced by their perception of security.

View Article and Find Full Text PDF

Electric vehicles (EVs) rely heavily on lithium-ion battery packs as essential energy storage components. However, inconsistencies in cell characteristics and operating conditions can lead to imbalanced state of charge (SOC) levels, resulting in reduced capacity and accelerated degradation. This study presents an active cell balancing method optimized for both charging and discharging scenarios, aiming to equalize SOC across cells and improve overall pack performance.

View Article and Find Full Text PDF

Rapid and accurate multi-phenotype imputation for millions of individuals.

Nat Commun

January 2025

Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs & Fisheries college, Jimei University, Xiamen, Fujian, People's Republic of China.

Deep phenotyping can enhance the power of genetic analysis, including genome-wide association studies (GWAS), but the occurrence of missing phenotypes compromises the potential of such resources. Although many phenotypic imputation methods have been developed, the accurate imputation of millions of individuals remains challenging. In the present study, we have developed a multi-phenotype imputation method based on mixed fast random forest (PIXANT) by leveraging efficient machine learning (ML)-based algorithms.

View Article and Find Full Text PDF

Microbiota analysis of perimenopausal women experiencing recurrent vaginitis in conjunction with urinary tract infection.

BMC Microbiol

January 2025

Shanghai-MOST Key Laboratory of Health and Disease Genomics, NHC Key Lab of Reproduction Regulation, Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200237, China.

Background: Recurrent vaginitis in conjunction with urinary tract infection (RV/UTI) in perimenopausal women is a common clinical condition that impacts both doctors and patients. Its pathogenesis is not completely known, but the urogenital microbiota is thought to be involved. We compared the urogenital and gut microbiotas of perimenopausal women experiencing RV/UTI with those of age-matched controls to provide a new microbiological perspective and scheme for solving clinical problems.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!