Background: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study.

Methods: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000).

Results: The visualizations suggested that the model learned "spikes of probability" around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation - 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size.

Conclusions: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437774PMC
http://dx.doi.org/10.1186/s41512-024-00177-1DOI Listing

Publication Analysis

Top Keywords

training aucs
16
minimum node
16
node size
16
median test
16
random forest
12
probability estimation
12
simulation study
12
random forests
12
true auc
12
median training
12

Similar Publications

Objective: To develop predictive models for assessing deep vein thrombosis (DVT) risk among lumbar disc herniation (LDH) patients and evaluate their performances.

Methods: A retrospective study was conducted on 798 LDH patients treated at the First Hospital of Hebei Medical University from January 2017 to December 2023. The patients were divided into a training set (n = 558) and a test set (n = 240) using computer-generated random numbers in a ratio of 7:3.

View Article and Find Full Text PDF

Background: Ultrasound based radiomics prediction model can improve the differentiation ability of benign and malignant thyroid nodules to avoid overtreatment. This study evaluates the role of predictive models based on intranodular and perinodular ultrasound radiomics in distinguishing between benign and malignant thyroid nodules.

Methods: A total of 1,076 thyroid nodules were enrolled from three hospitals between 2016 and 2022, forming the training, validation and test cohorts.

View Article and Find Full Text PDF

Background And Aims: We sought to develop a minimally-invasive, robust, accessible nonendoscopic strategy to diagnose Barrett's esophagus (BE), esophageal adenocarcinoma (EAC), and its immediate precursor lesion, high-grade dysplasia (HGD) based on methylated DNA biomarkers applied to a retrievable sponge-capsule device in a cohort representative of the BE population (i.e., mostly short-segment, non-dysplastic BE, NDBE).

View Article and Find Full Text PDF

Objective: The objective of this study is to examine the potential of specific parameters in determining renal involvement in adult patients diagnosed with Immunoglobulin A vasculitis (IgAV).

Methods: The patients' records with IgAV meeting the EULAR/PRINTO/PRES classification criteria who were diagnosed between January 2017 and January 2022 were retrospectively reviewed. The Birmingham Vasculitis Activity Score (BVAS) version 3 was used to assess initial disease activity.

View Article and Find Full Text PDF

Development of a nomogram for overall survival in patients with esophageal carcinoma: A prospective cohort study in China.

World J Gastrointest Oncol

January 2025

Chongqing Cancer Multi-omics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China.

Background: Esophageal carcinoma (EC) presents a significant public health issue in China, with its prognosis impacted by myriad factors. The creation of a reliable prognostic model for the overall survival (OS) of EC patients promises to greatly advance the customization of treatment approaches.

Aim: To create a more systematic and practical model that incorporates clinically significant indicators to support decision-making in clinical settings.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!