Semantically redundant training data removal and deep model classification performance: A study with chest X-rays.

Sivaramakrishnan Rajaraman Ghada Zamzmi Feng Yang Zhaohui Liang Zhiyun Xue Sameer Antani

Comput Med Imaging Graph

National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. Electronic address:

Published: July 2024

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11144082	PMC
http://dx.doi.org/10.1016/j.compmedimag.2024.102379	DOI Listing

Publication Analysis

Top Keywords

training data

data

semantically redundant

redundant training

semantic redundancy

model trained

training

performance

data removal

removal deep

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Signant Health, Blue Bell, PA, USA.

Amanda Hackebeil Gila Barbati Sayaka Machizawa Jessica Stenclik Erica R Appleman

Background: In Alzheimer's Disease trials, the Mini-Mental State Examination (MMSE) and Clinical Dementia Rating (CDR) are commonly utilized as inclusionary criteria at screening. These measures, however, do not always reaffirm inclusionary status at baseline. Score changes between screening and baseline visits may imply potential score inflation at screening leading to inappropriate participant enrollment.

View Article and Find Full Text PDF

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Genentech, Inc., South San Francisco, CA, USA.

Marina Ritchie Seema Datta Cecilia Monteiro Balazs Toth Edmond Teng

Background: Participant retention is a key determinant for a successful clinical trial. In Alzheimer's disease (AD) trials, participants are typically required to enroll with a study partner, which adds barriers to retention. Previous analyses of North American trial data found that most study partners were spouses and that such dyads had higher study completion rates than other study partner types.

View Article and Find Full Text PDF

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

University of California, Irvine, Irvine, CA, USA.

Adam I Birnbaum Zion T Grant-Freeman Joshua D Grill Dan Hoang Adrijana Gombosev

Background: Recruitment registries are tools to decrease the time and cost required to identify and enroll eligible participants into clinical research. Despite their potential to increase the efficiency of accrual, few analyses have assessed registry effectiveness. We investigated the outcomes of study referrals from the Consent-to-Contact (C2C) registry, a recruitment registry at the University of California, Irvine.

View Article and Find Full Text PDF

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Netherlands.

Leonie N C Visser Tjeerd Fluitman Aniek M van Gils Pieter J van der Veere Argonde C van Harten

Background: The first disease-modifying treatments (DMTs) for Alzheimer's disease (AD) have been approved in the USA, marking profound changes in AD-diagnosis and treatment. This will bring new challenges in terms of clinician-patient communication. We aimed to collect the perspectives of memory clinic professionals regarding the most important topics to address and what (tools) would support professionals and their patients and care partners to engage in a meaningful conversation on whether (or not) to initiate treatment.

View Article and Find Full Text PDF

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Unlearn.AI, San Francisco, CA, USA.

Amin Yakubu Jennifer Bogert Run Zhuang Gayle Wittenberg Christine Pozniak

Background: Pivotal Alzheimer's Disease (AD) trials typically require thousands of participants, resulting in long enrollment timelines and substantial costs. We leverage deep learning predictive models to create prognostic scores (forecasted control outcome) of trial participants and in combination with a linear statistical model to increase statistical power in randomized clinical trials (RCT). This is a straightforward extension of the traditional RCT analysis, allowing for ease of use in any clinical program.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!