Exploring synthetic datasets for computer-aided detection: a case study using phantom scan data for enhanced lung nodule false positive reduction.

J Med Imaging (Bellingham)

FDA, CDRH, OSEL, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, Maryland, United States.

Published: July 2024

Purpose: Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.

Approach: Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline.

Results: We apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom.

Conclusions: The scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304989PMC
http://dx.doi.org/10.1117/1.JMI.11.4.044507DOI Listing

Publication Analysis

Top Keywords

clinical data
24
synthetic datasets
16
machine learning
12
computer-aided detection
8
data
8
lung nodule
8
false positive
8
positive reduction
8
clinical
8
train machine
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!