Deep Learning for Lung Cancer Detection on Screening CT Scans: Results of a Large-Scale Public Competition and an Observer Study with 11 Radiologists.

Radiol Artif Intell

Department of Radiology, Nuclear Medicine and Anatomy, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, the Netherlands (C.J., A.A.A.S., E.T.S., P.K.G., H.B., M.B., B.G., S.S., B.v.G.); Department of Digital Technology & Innovation, Siemens Healthineers, Erlangen, Germany (A.A.A.S.); Department of Radiology, University Medical Center Utrecht, Utrecht, the Netherlands (F.A.M.H., P.A.d.J.); ETZ (Elisabeth-TweeSteden Ziekenhuis), Tilburg, the Netherlands (E.R.); Section of Radiology, Department of Medicine and Surgery (DiMeC), University of Parma, Parma, Italy (M.S.); Department of Radiology, Meander Medical Center, Amersfoort, the Netherlands (K.C., S.S.); Department of Radiology, AZ Zeno, Knokke-Heist, Belgium (J.M.); Department of Imaging, Royal Brompton Hospital, London, England (A.D.); Division of Cancer Prevention (P.F.P.) and Center for Biomedical Informatics & Information Technology (K.F.), National Cancer Institute, National Institutes of Health, Bethesda, Md; British Columbia Cancer Agency and the University of British Columbia, Vancouver, Canada (S.C.L.); and Fraunhofer MEVIS, Bremen, Germany (B.v.G.).

Published: November 2021

Purpose: To determine whether deep learning algorithms developed in a public competition could identify lung cancer on low-dose CT scans with a performance similar to that of radiologists.

Materials And Methods: In this retrospective study, a dataset consisting of 300 patient scans was used for model assessment; 150 patient scans were from the competition set and 150 were from an independent dataset. Both test datasets contained 50 cancer-positive scans and 100 cancer-negative scans. The reference standard was set by histopathologic examination for cancer-positive scans and imaging follow-up for at least 2 years for cancer-negative scans. The test datasets were applied to the three top-performing algorithms from the Kaggle Data Science Bowl 2017 public competition: grt123, Julian de Wit and Daniel Hammack (JWDH), and Aidence. Model outputs were compared with an observer study of 11 radiologists that assessed the same test datasets. Each scan was scored on a continuous scale by both the deep learning algorithms and the radiologists. Performance was measured using multireader, multicase receiver operating characteristic analysis.

Results: The area under the receiver operating characteristic curve (AUC) was 0.877 (95% CI: 0.842, 0.910) for grt123, 0.902 (95% CI: 0.871, 0.932) for JWDH, and 0.900 (95% CI: 0.870, 0.928) for Aidence. The average AUC of the radiologists was 0.917 (95% CI: 0.889, 0.945), which was significantly higher than grt123 ( = .02); however, no significant difference was found between the radiologists and JWDH ( = .29) or Aidence ( = .26).

Conclusion: Deep learning algorithms developed in a public competition for lung cancer detection in low-dose CT scans reached performance close to that of radiologists. Lung, CT, Thorax, Screening, Oncology © RSNA, 2021.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637223PMC
http://dx.doi.org/10.1148/ryai.2021210027DOI Listing

Publication Analysis

Top Keywords

deep learning
16
public competition
16
lung cancer
12
learning algorithms
12
test datasets
12
scans
9
cancer detection
8
observer study
8
study radiologists
8
algorithms developed
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!