Random forests (RFs) are a widely used modelling tool capable of feature selection via a variable importance measure (VIM), however, a threshold is needed to control for false positives. In the absence of a good understanding of the characteristics of VIMs, many current approaches attempt to select features associated to the response by training multiple RFs to generate statistical power via a permutation null, by employing recursive feature elimination, or through a combination of both. However, for high-dimensional datasets these approaches become computationally infeasible. In this paper, we present RFlocalfdr, a statistical approach, built on the empirical Bayes argument of Efron, for thresholding mean decrease in impurity (MDI) importances. It identifies features significantly associated with the response while controlling the false positive rate. Using synthetic data and real-world data in health, we demonstrate that RFlocalfdr has equivalent accuracy to currently published approaches, while being orders of magnitude faster. We show that RFlocalfdr can successfully threshold a dataset of 10 datapoints, establishing its usability for large-scale datasets, like genomics. Furthermore, RFlocalfdr is compatible with any RF implementation that returns a VIM and counts, making it a versatile feature selection tool that reduces false discoveries.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497997 | PMC |
http://dx.doi.org/10.1016/j.csbj.2023.08.033 | DOI Listing |
Theor Appl Genet
January 2025
Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA.
In tetraploid F1 populations, traditional segregation distortion tests often inaccurately flag SNPs due to ignoring polyploid meiosis processes and genotype uncertainty. We develop tests that account for these factors. Genotype data from tetraploid F1 populations are often collected in breeding programs for mapping and genomic selection purposes.
View Article and Find Full Text PDFAging Clin Exp Res
January 2025
Department of Spine Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, 710054, Shaanxi, China.
Objective: This study aims to analyze adverse drug events (ADE) related to romosozumab from the second quarter of 2019 to the third quarter of 2023 from FAERS database.
Methods: The ADE data related to romosozumab from 2019 Q2 to 2023 Q3 were collected. After data normalization, four signal strength quantification algorithms were used: ROR (Reporting Odds Ratios), PRR (Proportional Reporting Ratios), BCPNN (Bayesian Confidence Propagation Neural Network), and EBGM (Empirical Bayesian Geometric Mean).
Mol Autism
January 2025
Department of Special Education, University of Haifa, Haifa, Israel.
Background: Alterations in sensory perception, a core phenotype of autism, are attributed to imbalanced integration of sensory information and prior knowledge during perceptual statistical (Bayesian) inference. This hypothesis has gained momentum in recent years, partly because it can be implemented both at the computational level, as in Bayesian perception, and at the level of canonical neural microcircuitry, as in predictive coding. However, empirical investigations have yielded conflicting results with evidence remaining limited.
View Article and Find Full Text PDFBiol Psychiatry Cogn Neurosci Neuroimaging
January 2025
Department of Psychiatry, University of Pittsburgh School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
Background: Effective connectivity (EC) analysis provides valuable insights into the directionality of neural interactions, crucial for understanding the mechanisms underlying cognitive and emotional regulation in depressive and anxiety disorders. This study examined EC within key neural networks during working memory (WM) and emotional regulation (ER) tasks in young adults, both healthy and seeking help from mental health professionals for emotional distress.
Methods: Dynamic Causal Modeling (DCM) was employed to analyze EC in two independent samples (n=97 and n=94).
Strong sex differences exist in sleep phenotypes and also cardiovascular diseases (CVDs). However, sex-specific causal effects of sleep phenotypes on CVD-related outcomes have not been thoroughly examined. Mendelian randomization (MR) analysis is a useful approach for estimating the causal effect of a risk factor on an outcome of interest when interventional studies are not available.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!