Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.

John R Zech Marcus A Badgeley Manway Liu Anthony B Costa Joseph J Titano Eric Karl Oermann

PLoS Med

Department of Neurological Surgery, Icahn School of Medicine, New York, New York, United States of America.

Published: November 2018

Background: There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task.

Methods And Findings: A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong's test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855-0.866) on the joint MSH-NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927-0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745-0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH-NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system-specific biases.

Conclusion: Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219764	PMC
http://dx.doi.org/10.1371/journal.pmed.1002683	DOI Listing

Publication Analysis

Top Keywords

chest radiographs

hospital system

data

variable generalization

performance

generalization performance

performance deep

deep learning

learning model

model detect

Similar Publications

Giant thymolipoma in a 16-year-old girl with multimodal diagnostic approach and surgical management: a case report.

AME Case Rep

January 2025

Department of Clinical Medicine, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.

Ziwei Wang Jicheng Xiong Lin Peng Xiaobo Wu Yongtao Han

Background: Thymolipomas are rare benign mediastinal tumors primarily occurring in young adults, although they can also present in pediatric populations. These tumors are often asymptomatic, but their substantial size can create significant diagnostic and therapeutic challenges, necessitating careful evaluation and management.

Case Description: A teenage girl was diagnosed with a giant thymolipoma, which was discovered incidentally during a routine chest radiograph.

View Article and Find Full Text PDF

Similar Publications

Multilevel support-assisted prototype optimization network for few-shot medical segmentation of lung lesions.

Sci Rep

January 2025

Shandong Provincial Public Health Clinical Center, Shandong University, Jinan, 250013, Shandong, China.

Yuan Tian Yongquan Liang Yufeng Chen Jingjing Zhang Hongyang Bian

Medical image annotation is scarce and costly. Few-shot segmentation has been widely used in medical image from only a few annotated examples. However, its research on lesion segmentation for lung diseases is still limited, especially for pulmonary aspergillosis.

View Article and Find Full Text PDF

Similar Publications

Predictors of Hospitalization for Patients Presenting to Emergency Department with COVID-19 Infection.

J Clin Med

January 2025

Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21224, USA.

Alhareth Alsagban Amteshwar Singh Anurima Baidya Monika Dalal Waseem Khaliq

: Predictors of morbidity and mortality in hospitalized COVID-19 patients have been extensively studied. However, comparative analyses of predictors for hospitalization versus discharge from the emergency department remain limited. : This retrospective study evaluated predictors of hospitalization among adults (≥18 years) presenting to the emergency department with COVID-19 infection between 1 March 2020 and 15 June 2020.

View Article and Find Full Text PDF

Similar Publications

The Frequency of Mediastinal Lymph Node Calcification in Sarcoidosis Patients and the Influencing Factors.

Medicina (Kaunas)

December 2024

Department of Respiratory Disease, Cukurova University Faculty of Medicine, Yüreğir, Adana 01250, Turkey.

Pelin Pınar Deniz Pelin Duru Çetinkaya Saida Mehdiyeva İsmail Hanta

: This study investigates the prevalence of calcification in mediastinal lymph nodes among sarcoidosis patients and the influencing factors. Sarcoidosis is a multisystemic inflammatory disease characterized by non-caseating epithelioid granulomas. Bilateral hilar lymphadenopathy (LAP) is the most common radiographic finding, with studies showing a correlation between the frequency of lymph node calcification and disease duration, with a frequency of 3% relating to a duration of 5 years and a frequency of 20% relating to one of 10 years.

View Article and Find Full Text PDF

Similar Publications

Assessment of scattered and leakage radiation from ultra-portable X-ray systems in chest imaging: An independent study.

PLOS Glob Public Health

January 2025

Médecins Sans Frontières, International, Geneva, Switzerland.

Leonie E Paulis Roald S Schnerr Jarred Halton Zhi Zhen Qin Arlene Chua

Ultraportable (UP) X-ray devices are ideal to use in community-based settings, particularly for chest X-ray (CXR) screening of tuberculosis (TB). Unfortunately, there is insufficient guidance on the radiation safety of these devices. This study aims to determine the radiation dose by UP X-ray devices to both the public and radiographers compared to international dose limits.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!