Unlabelled: Handling missing values is a crucial step in preprocessing data in Machine Learning. Most available algorithms for analyzing datasets in the feature selection process and classification or estimation process analyze complete datasets. Consequently, in many cases, the strategy for dealing with missing values is to use only instances with full data or to replace missing values with a mean, mode, median, or a constant value. Usually, discarding missing samples or replacing missing values by means of fundamental techniques causes bias in subsequent analyzes on datasets.
Aim: Demonstrate the positive impact of multivariate imputation in the feature selection process on datasets with missing values.
Results: We compared the effects of the feature selection process using complete datasets, incomplete datasets with missingness rates between 5 and 50%, and imputed datasets by basic techniques and multivariate imputation. The feature selection algorithms used are well-known methods. The results showed that the datasets imputed by multivariate imputation obtained the best results in feature selection compared to datasets imputed by basic techniques or non-imputed incomplete datasets.
Conclusions: Considering the results obtained in the evaluation, applying multivariate imputation by MICE reduces bias in the feature selection process.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8318311 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0254720 | PLOS |
Abdom Radiol (NY)
January 2025
Department of Radiology, Taizhou Municipal Hospital, Taizhou, Zhejiang, China.
Background: To develop and validate a clinical-radiomics model for preoperative prediction of lymphovascular invasion (LVI) in rectal cancer.
Methods: This retrospective study included data from 239 patients with pathologically confirmed rectal adenocarcinoma from two centers, all of whom underwent MRI examinations. Cases from the first center (n = 189) were randomly divided into a training set and an internal validation set at a 7:3 ratio, while cases from the second center (n = 50) constituted the external validation set.
Bioinformatics
January 2025
Department of Pathology and Department of Immunobiology, Yale School of Medicine.
Summary: With the increased reliance on multi-omics data for bulk and single cell analyses, the availability of robust approaches to perform unsupervised learning for clustering, visualization, and feature selection is imperative. We introduce nipalsMCIA, an implementation of multiple co-inertia analysis (MCIA) for joint dimensionality reduction that solves the objective function using an extension to Non-linear Iterative Partial Least Squares (NIPALS). We applied nipalsMCIA to both bulk and single cell datasets and observed significant speed-up over other implementations for data with a large sample size and/or feature dimension.
View Article and Find Full Text PDFBMC Geriatr
January 2025
Department of Cardiology, The Second Hospital & Clinical Medical School, Lanzhou University, No. 82 Cuiyingmen, Lanzhou, 730000, China.
Objective: Constructing a predictive model for the occurrence of heart disease in elderly hypertensive individuals, aiming to provide early risk identification.
Methods: A total of 934 participants aged 60 and above from the China Health and Retirement Longitudinal Study with a 7-year follow-up (2011-2018) were included. Machine learning methods (logistic regression, XGBoost, DNN) were employed to build a model predicting heart disease risk in hypertensive patients.
Sci Rep
January 2025
Department of Orthopaedics, Traditional Chinese Medical Hospital of Gansu Province, Qilihe District, Guazhou Street 418, Lanzhou, 730050,, Gansu, China.
Knee osteoarthritis (KOA) represents a progressive degenerative disorder characterized by the gradual erosion of articular cartilage. This study aimed to develop and validate biomarker-based predictive models for KOA diagnosis using machine learning techniques. Clinical data from 2594 samples were obtained and stratified into training and validation datasets in a 7:3 ratio.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Electronics and Communication Engineering, Panimalar Engineering College, Chennai, India.
The growing number of connected devices in smart home environments has amplified security risks, particularly from Man-in-the-Middle (MitM) attacks. These attacks allow cybercriminals to intercept and manipulate communication streams between devices, often remaining undetected. Traditional rule-based methods struggle to cope with the complexity of these attacks, creating a need for more advanced, adaptive intrusion detection systems.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!