AI Article Synopsis

  • A growing trend in research is using deep neural networks (DNNs) on EEG recordings to diagnose disorders, but there's concern about data leakage.
  • Many studies randomly assign EEG segments to training or test sets, leading to the same subjects’ data being in both sets, potentially skewing accuracy results.
  • Our research shows that performance metrics using a segment-based holdout strategy can greatly overestimate classifier accuracy on new subjects, highlighting a major issue in current DNN-EEG studies.

Article Abstract

A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11099244PMC
http://dx.doi.org/10.3389/fnins.2024.1373515DOI Listing

Publication Analysis

Top Keywords

test set
16
training test
12
segment-based holdout
12
data leakage
8
holdout segments
8
segments subject
8
subject appear
8
dnn-eeg studies
8
studies
5
set
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!