Diagnostic Performance of Artificial Intelligence in Chest Radiographs Referred from the Emergency Department.

Julia López Alcolea Ana Fernández Alfonso Raquel Cano Alonso Ana Álvarez Vázquez Alejandro Díaz Moreno David García Castellanos Lucía Sanabria Greciano Chawar Hayoun Manuel Recio Rodríguez Cristina Andreu Vázquez Israel John Thuissard Vasallo Vicente Martínez de Vega

Diagnostics (Basel)

Hospital Universitario QuironSalud Madrid, 28223 Madrid, Spain.

Published: November 2024

* Results showed high sensitivity for detecting fractures and pneumothorax (100% for both), moderate for pulmonary opacity (AI: 76%, resident: 71%), and lower sensitivity for pulmonary nodules (AI: 33%, resident: 75%).
* The AI system also frequently labeled cases as "doubtful," while the resident showed more confidence; overall, the agreement between the AI and the resident was only fair, suggesting limitations of the AI in detecting some important findings.

Background: The increasing integration of AI in chest X-ray evaluation holds promise for enhancing diagnostic accuracy and optimizing clinical workflows. However, understanding its performance in real-world clinical settings is essential.

Objectives: In this study, we evaluated the sensitivity (Se) and specificity (Sp) of an AI-based software (Arterys MICA v29.4.0) alongside a radiology resident in interpreting chest X-rays referred from the emergency department (ED), using a senior radiologist's assessment as the gold standard (GS). We assessed the concordance between the AI system and the resident, noted the frequency of doubtful cases for each category, identified how many were considered positive by the GS, and assessed variables that AI was not trained to detect.

Methods: We conducted a retrospective observational study analyzing chest X-rays from a sample of 784 patients referred from the ED at our hospital. The AI system was trained to detect five categorical variables-pulmonary nodule, pulmonary opacity, pleural effusion, pneumothorax, and fracture-and assign each a confidence label ("positive", "doubtful", or "negative").

Results: Sensitivity in detecting fractures and pneumothorax was high (100%) for both AI and the resident, moderate for pulmonary opacity (AI = 76%, resident = 71%), and acceptable for pleural effusion (AI = 60%, resident = 67%), with negative predictive values (NPV) above 95% and areas under the curve (AUC) exceeding 0.8. The resident showed moderate sensitivity (75%) for pulmonary nodules, while AI's sensitivity was low (33%). AI assigned a "doubtful" label to some diagnoses, most of which were deemed negative by the GS; the resident expressed doubt less frequently. The Kappa coefficient between the resident and AI was fair (0.3) across most categories, except for pleural effusion, where concordance was moderate (0.5). Our study highlighted additional findings not detected by AI, including 16% prevalence of mediastinal abnormalities, 20% surgical materials, and 20% other pulmonary findings.

Conclusions: Although AI demonstrated utility in identifying most primary findings-except for pulmonary nodules-its high NPV suggests it may be valuable for screening. Further training of the AI software and broadening its scope to identify additional findings could enhance its detection capabilities and increase its applicability in clinical practice.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592727	PMC
http://dx.doi.org/10.3390/diagnostics14222592	DOI Listing

Publication Analysis

Top Keywords

pleural effusion

referred emergency

emergency department

resident

chest x-rays

pulmonary opacity

resident moderate

additional findings

pulmonary

diagnostic performance

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!