Extrapolated cross-validation for randomized ensembles.

J Comput Graph Stat

Department of Statistics and Data Science, Carnegie Mellon University.

Published: January 2024

Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields -optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests under a computational constraint on the maximum ensemble size. Compared to sample-split and -fold cross-validation, ECV achieves higher accuracy by avoiding sample splitting. Meanwhile, its computational cost is considerably lower owing to the use of the risk extrapolation technique.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11492369PMC
http://dx.doi.org/10.1080/10618600.2023.2288194DOI Listing

Publication Analysis

Top Keywords

risk extrapolation
12
extrapolation technique
12
extrapolated cross-validation
8
randomized ensembles
8
random forests
8
tuning ensemble
8
ensemble subsample
8
subsample sizes
8
prediction risk
8
ensemble
6

Similar Publications

Introduction: Cardiovascular diseases (CVDs) present differently in women and men, influenced by host-microbiome interactions. The roles of sex hormones in CVD outcomes and gut microbiome in modifying these effects are poorly understood. The XCVD study examines gut microbiome mediation of sex hormone effects on CVD risk markers by observing transgender participants undergoing gender-affirming hormone therapy (GAHT), with findings expected to extrapolate to cisgender populations.

View Article and Find Full Text PDF

Background: To present rates of reporting bias in systematic reviews and meta-analyses investigating meniscal root repair.

Methods: In this systematic review, PubMed, Scopus and Web of Science databases were queried for studies that investigated meniscal root tears treated with root repair. Included studies were systematic reviews and/or meta-analyses published in peer-reviewed journals in the English language with available full-texts.

View Article and Find Full Text PDF

Public Health.

Alzheimers Dement

December 2024

Amsterdam Neuroscience, Neurodegeneration, Amsterdam, Netherlands.

Background: Survival estimates for individuals with Alzheimer's disease (AD) are informative to understand the full disease trajectory. A previous meta-analysis estimated the mean survival of AD patients at 5.8 years from diagnosis, but precise estimates for atypical AD variants are scarce.

View Article and Find Full Text PDF

(E)-1,1,1,2,2,5,5,6,6,6-Decafluoro-3-hexene (HFO-153-10mczz-E).

Toxicol Ind Health

January 2025

Cincinnati, OH, USA.

(E)-1,1,1,2,2,5,5,6,6,6-Decafluoro-3-hexene (HFO-153-10mczz-E) (CASRN 1256353-26-0) is a volatile liquid proposed for use as a new low global-warming potential dielectric fluid in cooling applications. Workplace exposures are expected to be by inhalation exposure. The substance has low acute inhalation toxicity as indicated by a 4-h inhalation LC value of approximately 8000 ppm.

View Article and Find Full Text PDF

Biomarkers.

Alzheimers Dement

December 2024

Artificial Intelligence in Biomedical Imaging Laboratory (AIBIL), Center for and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Background: The Spatial Pattern of Abnormality for REcognition of Alzheimer's Disease (SPARE-AD) index ∖citep{davatzikos2009longitudinal} is one such marker that robustly discriminates between early brain changes observed in cognitively normal aging (CN), mild cognitive impairment (MCI), and Alzheimer's Disease (AD) phenotypes. The adoption of such markers to the clinical setting combined with the ability to forecast their future trajectories would be of great value during clinical assessment, and could improve clinical trial design through targeted risk stratification.

Method: Subjects scanned using the same scanner with more than four longitudinal MRI acquisitions from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Baltimore Longitudinal Study of Aging (BLSA) study cohorts were used for method development.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!