Predictive modeling has now become a central technique in neuroimaging to identify complex brain-behavior relationships and test their generalizability to unseen data. However, data leakage, which unintentionally breaches the separation between data used to train and test the model, undermines the validity of predictive models. Previous literature suggests that leakage is generally pervasive in machine learning, but few studies have empirically evaluated the effects of leakage in neuroimaging data. Although leakage is always an incorrect practice, understanding the effects of leakage on neuroimaging predictive models provides insight into the extent to which leakage may affect the literature. Here, we investigated the effects of leakage on machine learning models in two common neuroimaging modalities, functional and structural connectomes. Using over 400 different pipelines spanning four large datasets and three phenotypes, we evaluated five forms of leakage fitting into three broad categories: feature selection, covariate correction, and lack of independence between subjects. As expected, leakage via feature selection and repeated subjects drastically inflated prediction performance. Notably, other forms of leakage had only minor effects (e.g., leaky site correction) or even decreased prediction performance (e.g., leaky covariate regression). In some cases, leakage affected not only prediction performance, but also model coefficients, and thus neurobiological interpretations. Finally, we found that predictive models using small datasets were more sensitive to leakage. Overall, our results illustrate the variable effects of leakage on prediction pipelines and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10793416PMC
http://dx.doi.org/10.1101/2023.06.09.544383DOI Listing

Publication Analysis

Top Keywords

data leakage
16
effects leakage
16
leakage
15
machine learning
12
predictive models
12
prediction performance
12
learning models
8
predictive modeling
8
leakage neuroimaging
8
forms leakage
8

Similar Publications

Microvascular decompression is considered a first-line treatment in classical trigeminal neuralgia. Teflon is the material commonly used. The use of autologous muscle has been occasionally reported.

View Article and Find Full Text PDF

Deep Learning Models for Automatic Classification of Anatomic Location in Abdominopelvic Digital Subtraction Angiography.

J Imaging Inform Med

January 2025

Department of Radiology, UC Davis School of Medicine, University of California, Davis, 4860 Y Street, Suite 3100, Sacramento, CA, 95817-2307, USA.

Purpose: To explore the information in routine digital subtraction angiography (DSA) and evaluate deep learning algorithms for automated identification of anatomic location in DSA sequences.

Methods: DSA of the abdominal aorta, celiac, superior mesenteric, inferior mesenteric, and bilateral external iliac arteries was labeled with the anatomic location from retrospectively collected endovascular procedures performed between 2010 and 2020 at a tertiary care medical center. "Key" images within each sequence demonstrating the parent vessel and the first bifurcation were additionally labeled.

View Article and Find Full Text PDF

Study Design: Multicenter retrospective cohort study.

Objective: To evaluate the efficacy and safety of using cement-augmented pedicle screw (CAPS) fixation only for the cephalad and caudal vertebral bodies.

Summary Of Background Data: Pedicle screw fixation is less effective in patients with low-quality bone.

View Article and Find Full Text PDF

Background: Lymphatic leaks are associated with significant mortality and morbidity. Intranodal lymphangiography (ILAG) involves the direct injection of ethiodised lipid into the hilum of lymph nodes. It is diagnostic procedure that can have therapeutic effects secondary to a local sclerosant effect.

View Article and Find Full Text PDF

Biomarkers.

Alzheimers Dement

December 2024

Urmia University of Medical Science, Urmia, West Azarbayjan, Iran (Islamic Republic of).

Background: Cerebral microbleeds (CMBs) are small hypointense round lesions that indicate leakage of blood products from cerebral vessels damaged by β-amyloid-40 (Aβ) and typically are detected by T2*-weighted and susceptibility weighted imaging (SWI) on MRI. They are indicators of cerebral small vessel diseases, especially cerebral amyloid angiopathy (CAA), affecting cortical small arteries. Quantitative susceptibility mapping (QSM) is an advanced MRI imaging technique used to quantify the magnetic susceptibility of tissues in the human body.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!