Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package: https://github.com/leekgroup/postpi.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720220PMC
http://dx.doi.org/10.1073/pnas.2001238117DOI Listing

Publication Analysis

Top Keywords

predicted outcomes
24
observed predicted
12
postprediction inference
12
predicted
10
inference
9
outcomes
9
methods correcting
8
outcomes predicted
8
subsequent statistical
8
inference predicted
8

Similar Publications

The aim of this study was to evaluate the clinical benefits and outcomes of adjuvant radiation therapy on adrenocortical carcinoma (ACC) patients. All patients with ACC that were reported between 2010 and 2015 were identified from the Surveillance, Epidemiology, and End Results database. A forward-stepwise Cox proportional hazards regression was used to identify independent risk factors.

View Article and Find Full Text PDF

T-helper 17 (Th17) cells significantly influence the onset and advancement of malignancies. This study endeavor focused on delineating molecular classifications and developing a prognostic signature grounded in Th17 cell differentiation-related genes (TCDRGs) using machine learning algorithms in head and neck squamous cell carcinoma (HNSCC). A consensus clustering approach was applied to The Cancer Genome Atlas-HNSCC cohort based on TCDRGs, followed by an examination of differential gene expression using the limma package.

View Article and Find Full Text PDF

Background: The performance of quantitative pupillary light reflex (qPLR) and the neurological pupil index (NPi) was used to predict neurological outcomes in cardiac arrest (CA) patients.

Methods: Eligible studies on the ability of the qPLR and NPi to predict neurological outcomes in CA patients were searched from the PubMed and China National Knowledge Infrastructure databases until July 2023. The pooled odds ratio (OR) and its 95% confidence interval (95% CI), area under the curve, sensitivity analysis, and publication bias were analyzed via Stata 14.

View Article and Find Full Text PDF

Objective: Craniopharyngiomas are rare, benign brain tumors that are primarily treated with surgery. Although the extended endoscopic endonasal approach (EEEA) has evolved as a more reliable surgical alternative and yields better visual outcomes than traditional craniotomy, postoperative visual deterioration remains one of the most common complications, and relevant risk factors are still poorly defined. Hence, identifying risk factors and developing a predictive model for postoperative visual deterioration is indeed necessary.

View Article and Find Full Text PDF

Objective: The present study aimed to investigate the association between pituitary adenoma (PA) consistency and other measurable tumor characteristics, extent of resection (EOR), postoperative complications, and outcomes.

Methods: In total, 507 PA resections were intraoperatively assigned a consistency grade from 1 (cystic/hemorrhagic tumors) to 5 (calcified tumors) based on intraoperative tumor characteristics. Tumor consistency was analyzed in tertiles (grades 1 and 2, grade 3, and grades 4 and 5) to determine associations with tumor characteristics, EOR, recurrence, postoperative outcomes, and complications.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!