Background: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies.

Methods: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We also propose a method using Gaussian Processes to extract meaningfull structured covariates from longitudinal data.

Results: Among all assessed statistical methods, the survival analysis ones obtain the best results. In particular the C-mix model yields the better performances in both the two considered settings (AUC =0.94 in the binary outcome setting), as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings.

Conclusions: It appears that learning withing the survival analysis setting first (so using all the temporal information), and then going back to a binary prediction using the survival estimates gives significantly better prediction performances than the ones obtained by models trained "directly" within the binary outcome setting.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404305PMC
http://dx.doi.org/10.1186/s12874-019-0673-4DOI Listing

Publication Analysis

Top Keywords

binary outcome
16
outcome setting
16
survival analysis
16
analysis setting
12
prediction variables
8
variables selection
8
setting
8
statistical methods
8
setting consider
8
prediction
6

Similar Publications

Objective: To investigate the relationship between maternal age and nutritional status, and test associations between maternal nutritional status and child mortality with a focus on maternal obesity.

Design: Secondary analysis of data from nationally representative cross-sectional sample of women of reproductive ages (15-49 years) and their children under five years. The outcome variable for maternal nutritional status was Body Mass Index (BMI), classified into underweight (BMI < 18.

View Article and Find Full Text PDF

Effects of double data extraction on errors in evidence synthesis: a crossover, multicenter, investigator-blinded, randomized controlled trial.

Postgrad Med J

January 2025

Proof of Concept Center, Eastern Hepatobiliary Surgery Hospital, Third Affiliated Hospital, Second Military Medical University, Naval Medical University, No. 255, Yangpu District, Shanghai, 200433, China.

Objectives: The objective was to investigate the role of double extraction in reducing data errors in evidence synthesis for pharmaceutical and non-pharmaceutical interventions.

Design: Crossover randomized controlled trial (RCT).

Setting: University and hospital with teaching programs in evidence-based medicine.

View Article and Find Full Text PDF

Novel multiplexed spatial proteomics imaging platforms expose the spatial architecture of cells in the tumor microenvironment (TME). The diverse cell population in the TME, including its spatial context, has been shown to have important clinical implications, correlating with disease prognosis and treatment response. The accelerating implementation of spatial proteomic technologies motivates new statistical models to test if cell-level images associate with patient-level endpoints.

View Article and Find Full Text PDF

Purpose: While previous research has highlighted treatment delay inequities in early-stage breast cancer and identified potential contributing factors, there is limited research on disparities in treatment delays for metastatic breast cancer (MBC). This study investigates these disparities in MBC treatment initiation, aiming to identify key factors crucial for improving timely access to care.

Method: Nationwide Flatiron Health electronic health records-derived deidentified database, including females aged 18+ diagnosed with either De novo or relapsed MBC in the U.

View Article and Find Full Text PDF

Construction and validation of a nomogram predictive model for assessing the risk of surgical site infections following posterior lumbar fusion surgery.

Sci Rep

January 2025

Department of Spinal Surgery, Orthopedic Medical Center, Zhujiang Hospital, Southern Medical University, 253 Industrial Avenue Central, Guangzhou, 510260, Guangdong Province, China.

Surgical site infections (SSIs) are a significant concern following posterior lumbar fusion surgery, leading to increased morbidity and healthcare costs. Accurate prediction of SSI risk is crucial for implementing preventive measures and improving patient outcomes. This study aimed to construct and validate a nomogram predictive model for assessing the risk of SSIs following posterior lumbar fusion surgery.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!