AI Article Synopsis

  • Advances in medical imaging have enabled the classification of ankle fractures using AI, showing good internal validity with algorithms based on the AO/OTA 2018 framework.
  • * A deep-learning neural network was trained on 7,500 ankle studies and validated using datasets from Sweden and Australia, with key performance metrics indicating high accuracy in fracture classification.
  • * The model demonstrated strong external validity, maintaining high performance despite differences between the datasets, suggesting its potential applicability in diverse clinical settings.

Article Abstract

Background: Advances in medical imaging have made it possible to classify ankle fractures using Artificial Intelligence (AI). Recent studies have demonstrated good internal validity for machine learning algorithms using the AO/OTA 2018 classification. This study aimed to externally validate one such model for ankle fracture classification and ways to improve external validity.

Methods: In this retrospective observation study, we trained a deep-learning neural network (7,500 ankle studies) to classify traumatic malleolar fractures according to the AO/OTA classification. Our internal validation dataset (IVD) contained 409 studies collected from Danderyd Hospital in Stockholm, Sweden, between 2002 and 2016. The external validation dataset (EVD) contained 399 studies collected from Flinders Medical Centre, Adelaide, Australia, between 2016 and 2020. Our primary outcome measures were the area under the receiver operating characteristic (AUC) and the area under the precision-recall curve (AUPR) for fracture classification of AO/OTA malleolar (44) fractures. Secondary outcomes were performance on other fractures visible on ankle radiographs and inter-observer reliability of reviewers.

Results: Compared to the weighted mean AUC (wAUC) 0.86 (95%CI 0.82-0.89) for fracture detection in the EVD, the network attained wAUC 0.95 (95%CI 0.94-0.97) for the IVD. The area under the precision-recall curve (AUPR) was 0.93 vs. 0.96. The wAUC for individual outcomes (type 44A-C, group 44A1-C3, and subgroup 44A1.1-C3.3) was 0.82 for the EVD and 0.93 for the IVD. The weighted mean AUPR (wAUPR) was 0.59 vs 0.63. Throughout, the performance was superior to that of a random classifier for the EVD.

Conclusion: Although the two datasets had considerable differences, the model transferred well to the EVD and the alternative clinical scenario it represents. The direct clinical implications of this study are that algorithms developed elsewhere need local validation and that discrepancies can be rectified using targeted training. In a wider sense, we believe this opens up possibilities for building advanced treatment recommendations based on exact fracture types that are more objective than current clinical decisions, often influenced by who is present during rounds.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11451058PMC
http://dx.doi.org/10.1186/s12891-024-07884-2DOI Listing

Publication Analysis

Top Keywords

fracture classification
12
external validation
8
artificial intelligence
8
ankle fracture
8
malleolar fractures
8
validation dataset
8
studies collected
8
area precision-recall
8
precision-recall curve
8
curve aupr
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!