The integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Ideal repeatable models output predictions without variation during independent tests carried out under similar conditions. However, slight variations, though not ideal, may be unavoidable and acceptable in practice. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the each model's performance on four medical image classification tasks from public and private datasets: knee osteoarthritis, cervical cancer screening, breast density estimation, and retinopathy of prematurity. Repeatability is measured and compared on ResNet and DenseNet architectures. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increases repeatability, in particular at the class boundaries, for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 16% points and of the class disagreement rate by 7% points. The classification accuracy improves in most settings along with the repeatability. Our results suggest that beyond about 20 Monte Carlo iterations, there is no further gain in repeatability. In addition to the higher test-retest agreement, Monte Carlo predictions are better calibrated which leads to output probabilities reflecting more accurately the true likelihood of being correctly classified.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674698PMC
http://dx.doi.org/10.1038/s41746-022-00709-3DOI Listing

Publication Analysis

Top Keywords

monte carlo
20
carlo dropout
8
repeatability
8
classification performance
8
carlo predictions
8
classification
7
models
5
monte
5
carlo
5
improving repeatability
4

Similar Publications

Background: This study aimed to assess the histological and radiographic effects of sodium hexametaphosphate (SHMP) as a direct pulp capping (DPC) agent in immature permanent dog premolars.

Methods: A split-mouth design was employed with three healthy 4-month-old Mongrel dogs, each having 36 premolars. The premolars were randomly assigned to either SHMP or MTA.

View Article and Find Full Text PDF

Orthotopic tumor models in pre-clinical translational research are becoming increasingly popular, raising the demands on accurate tumor localization prior to irradiation. This task remains challenging both in X-ray and proton computed tomography (xCT and pCT, respectively), due to the limited contrast of tumor tissue compared to the surrounding tissue. We investigate the feasibility of gadolinium oxide nanoparticles as multimodal contrast enhancement agent for both imaging modalities.

View Article and Find Full Text PDF

Objectives: The purpose of this study was to investigate the fundamental properties of spot-scanning proton beams and compare them to Monte Carlo (MC) simulations, both with and without CT calibration, using spatially diverse combinations of materials.

Methods: A heterogeneous phantom was created by spatially distributing titanium, wax, and thermocol to generate six scenarios of heterogeneous combinations. Proton pencil beams ranging in energy from 100 to 226.

View Article and Find Full Text PDF

Objective: Deep brain stimulation (DBS) is an effective neurosurgical option for patients with treatment-resistant obsessive-compulsive disorder (OCD). Despite being more costly than neuroablative procedures of comparable efficacy, DBS has gained popularity over the years for its reversibility and adjustability. Although the cost-effectiveness of DBS has been investigated extensively in movement disorders, few economic analyses of DBS for psychiatric disorders exist.

View Article and Find Full Text PDF

This study aimed to identify radiotherapy dosimetric parameters related to local failure (LF)-free survival (LFFS) in patients with lung and liver oligometastases from colorectal cancer treated with stereotactic body radiotherapy (SBRT). We analyzed 75 oligometastatic lesions in 55 patients treated with SBRT between January 2014 and December 2021. There was no constraint or intentional increase in maximum dose.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!