Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data.

Lingjing Jiang Niina Haiminen Anna-Paola Carrieri Shi Huang Yoshiki Vázquez-Baeza Laxmi Parida Ho-Cheol Kim Austin D Swafford Rob Knight Loki Natarajan

Biometrics

Division of Biostatistics, University of California San Diego, La Jolla, California, USA.

Published: September 2022

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787628	PMC
http://dx.doi.org/10.1111/biom.13481	DOI Listing

Publication Analysis

Top Keywords

feature selection

selection methods

microbiome data

feature

criterion stability

model prediction

prediction metrics

data

selection

methods

Similar Publications

Use of the FHTHWA Index as a Novel Approach for Predicting the Incidence of Diabetes in a Japanese Population Without Diabetes: Data Analysis Study.

JMIR Med Inform

January 2025

Department of Endocrinology and Metabolism, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China.

Jiao Wang Jianrong Chen Ying Liu Jixiong Xu

Background: Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity.

Objective: We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population.

Methods: We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database.

View Article and Find Full Text PDF

Similar Publications

Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing.

Parasit Vectors

January 2025

Faculty of Information Technology, Mutah University, Mutah, Jordan.

Enas Al-Khlifeh Ahmad S Tarawneh Khalid Almohammadi Malek Alrashidi Ramadan Hassanat

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples.

View Article and Find Full Text PDF

Similar Publications

AI-based analysis of fetal growth restriction in a prospective obstetric cohort quantifies compound risks for perinatal morbidity and mortality and identifies previously unrecognized high risk clinical scenarios.

BMC Pregnancy Childbirth

January 2025

Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of Utah Health, 30 N. Mario Capecchi Dr., Level 5 South, Salt Lake City, UT, 84132, USA.

Raquel M Zimmerman Edgar J Hernandez Mark Yandell Martin Tristani-Firouzi Robert M Silver

Background: Fetal growth restriction (FGR) is a leading risk factor for stillbirth, yet the diagnosis of FGR confers considerable prognostic uncertainty, as most infants with FGR do not experience any morbidity. Our objective was to use data from a large, deeply phenotyped observational obstetric cohort to develop a probabilistic graphical model (PGM), a type of "explainable artificial intelligence (AI)", as a potential framework to better understand how interrelated variables contribute to perinatal morbidity risk in FGR.

Methods: Using data from 9,558 pregnancies delivered at ≥ 20 weeks with available outcome data, we derived and validated a PGM using randomly selected sub-cohorts of 80% (n = 7645) and 20% (n = 1,912), respectively, to discriminate cases of FGR resulting in composite perinatal morbidity from those that did not.

View Article and Find Full Text PDF

Similar Publications

Explore the factors related to the death of offspring under age five and appraise the hazard of child mortality using machine learning techniques in Bangladesh.

BMC Public Health

January 2025

Department of Statistics and Data Science, Jahangirnagar University, Dhaka, 1342, Bangladesh.

Ashikur Rahman Md Habibur Rahman

Background: Child mortality is a reliable and significant indicator of a nation's health. Although the child mortality rate in Bangladesh is declining over time, it still needs to drop even more in order to meet the Sustainable Development Goals (SDGs). Machine Learning models are one of the best tools for making more accurate and efficient forecasts and gaining in-depth knowledge.

View Article and Find Full Text PDF

Similar Publications

Anatomical characterization of Semi-arid Bignoniaceae using light and scanning electron microscopy.

BMC Plant Biol

January 2025

Plant Production Department, College of Food and Agricultural Sciences, King Saud University, P.O. Box. 2460, Riyadh, 11451, Saudi Arabia.

Romisha Sonia Shabnum Shaheen Muhammad Waheed Sana Imran Shiekh Marifatul Haq

Background: The present research work was done to evaluate the anatomical differences among selected species of the family Bignoniaceae, as limited anatomical data is available for this family in Pakistan. Bignoniaceae is a remarkable family for its various medicinal properties and anatomical characterization is an important feature for the identification and classification of plants.

Methodology: In this study, several anatomical structures were examined, including stomata type and shape, leaf epidermis shape, epidermal cell size, and the presence or absence of trichomes and crystals (e.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!