Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787628 | PMC |
http://dx.doi.org/10.1111/biom.13481 | DOI Listing |
JMIR Med Inform
January 2025
Department of Endocrinology and Metabolism, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China.
Background: Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity.
Objective: We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population.
Methods: We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database.
Parasit Vectors
January 2025
Faculty of Information Technology, Mutah University, Mutah, Jordan.
Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples.
View Article and Find Full Text PDFBMC Pregnancy Childbirth
January 2025
Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, University of Utah Health, 30 N. Mario Capecchi Dr., Level 5 South, Salt Lake City, UT, 84132, USA.
Background: Fetal growth restriction (FGR) is a leading risk factor for stillbirth, yet the diagnosis of FGR confers considerable prognostic uncertainty, as most infants with FGR do not experience any morbidity. Our objective was to use data from a large, deeply phenotyped observational obstetric cohort to develop a probabilistic graphical model (PGM), a type of "explainable artificial intelligence (AI)", as a potential framework to better understand how interrelated variables contribute to perinatal morbidity risk in FGR.
Methods: Using data from 9,558 pregnancies delivered at ≥ 20 weeks with available outcome data, we derived and validated a PGM using randomly selected sub-cohorts of 80% (n = 7645) and 20% (n = 1,912), respectively, to discriminate cases of FGR resulting in composite perinatal morbidity from those that did not.
BMC Public Health
January 2025
Department of Statistics and Data Science, Jahangirnagar University, Dhaka, 1342, Bangladesh.
Background: Child mortality is a reliable and significant indicator of a nation's health. Although the child mortality rate in Bangladesh is declining over time, it still needs to drop even more in order to meet the Sustainable Development Goals (SDGs). Machine Learning models are one of the best tools for making more accurate and efficient forecasts and gaining in-depth knowledge.
View Article and Find Full Text PDFBMC Plant Biol
January 2025
Plant Production Department, College of Food and Agricultural Sciences, King Saud University, P.O. Box. 2460, Riyadh, 11451, Saudi Arabia.
Background: The present research work was done to evaluate the anatomical differences among selected species of the family Bignoniaceae, as limited anatomical data is available for this family in Pakistan. Bignoniaceae is a remarkable family for its various medicinal properties and anatomical characterization is an important feature for the identification and classification of plants.
Methodology: In this study, several anatomical structures were examined, including stomata type and shape, leaf epidermis shape, epidermal cell size, and the presence or absence of trichomes and crystals (e.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!