Objective: Little is known about the effects of using different expert-determined reference standards when evaluating the performance of deep learning-based automatic detection (DLAD) models and their added value to radiologists. We assessed the concordance of expert-determined standards with a clinical gold standard (herein, pathological confirmation) and the effects of different expert-determined reference standards on the estimates of radiologists' diagnostic performance to detect malignant pulmonary nodules on chest radiographs with and without the assistance of a DLAD model.

Materials And Methods: This study included chest radiographs from 50 patients with pathologically proven lung cancer and 50 controls. Five expert-determined standards were constructed using the interpretations of 10 experts: individual judgment by the most experienced expert, majority vote, consensus judgments of two and three experts, and a latent class analysis (LCA) model. In separate reader tests, additional 10 radiologists independently interpreted the radiographs and then assisted with the DLAD model. Their diagnostic performance was estimated using the clinical gold standard and various expert-determined standards as the reference standard, and the results were compared using the test with Bonferroni correction.

Results: The LCA model (sensitivity, 72.6%; specificity, 100%) was most similar to the clinical gold standard. When expert-determined standards were used, the sensitivities of radiologists and DLAD model alone were overestimated, and their specificities were underestimated (all -values < 0.05). DLAD assistance diminished the overestimation of sensitivity but exaggerated the underestimation of specificity (all -values < 0.001). The DLAD model improved sensitivity and specificity to a greater extent when using the clinical gold standard than when using the expert-determined standards (all -values < 0.001), except for sensitivity with the LCA model ( = 0.094).

Conclusion: The LCA model was most similar to the clinical gold standard for malignant pulmonary nodule detection on chest radiographs. Expert-determined standards caused bias in measuring the diagnostic performance of the artificial intelligence model.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892220PMC
http://dx.doi.org/10.3348/kjr.2022.0548DOI Listing

Publication Analysis

Top Keywords

expert-determined standards
24
clinical gold
20
gold standard
20
diagnostic performance
16
chest radiographs
16
lca model
16
effects expert-determined
12
expert-determined reference
12
reference standards
12
dlad model
12

Similar Publications

Study Objectives: This paper validates TipTraQ, a compact home sleep apnea testing (HSAT) system. TipTraQ comprises a fingertip-worn device, a mobile application, and a cloud-based deep learning artificial intelligence (AI) system. The device utilizes PPG (red, infrared, and green channels) and accelerometer sensors to assess sleep apnea by the AI system.

View Article and Find Full Text PDF

Background: Machine learning (ML) can differentiate papilloedema from normal optic discs using fundus photos. Currently, papilloedema severity is assessed using the descriptive, ordinal Frisén scale. We hypothesise that ML can quantify papilloedema and detect a treatment effect on papilloedema due to idiopathic intracranial hypertension.

View Article and Find Full Text PDF

Objective: To determine the effectiveness of proficiency-based progression (PBP) e-learning in training in communication concerning clinically deteriorating patients.

Design: Single-centre multi-arm randomised double-blind controlled trial with three parallel arms.

Randomisation, Setting And Participants: A computer-generated program randomised and allocated 120 final year medical students in an Irish University into three trial groups.

View Article and Find Full Text PDF

Objective: Little is known about the effects of using different expert-determined reference standards when evaluating the performance of deep learning-based automatic detection (DLAD) models and their added value to radiologists. We assessed the concordance of expert-determined standards with a clinical gold standard (herein, pathological confirmation) and the effects of different expert-determined reference standards on the estimates of radiologists' diagnostic performance to detect malignant pulmonary nodules on chest radiographs with and without the assistance of a DLAD model.

Materials And Methods: This study included chest radiographs from 50 patients with pathologically proven lung cancer and 50 controls.

View Article and Find Full Text PDF

Determinants of gait dystonia severity in cerebral palsy.

Dev Med Child Neurol

July 2023

Departments of Neurology, Radiology, Neuroscience, Physical Therapy, and Occupational Therapy, Washington University School of Medicine, St Louis, MO, USA.

Aim: To determine the movement features governing expert assessment of gait dystonia severity in individuals with cerebral palsy (CP).

Method: In this prospective cohort study, three movement disorder neurologists graded lower extremity dystonia severity in gait videos of individuals with CP using a 10-point Likert-like scale. Using conventional content analysis, we determined the features experts cited when grading dystonia severity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!