Consistency of convolutional neural networks in dermoscopic melanoma recognition: A prospective real-world study about the pitfalls of augmented intelligence.

E V Goessinger S E Cerminara A M Mueller P Gottfrois S Huber M Amaral F Wenz L Kostner L Weiss M Kunz J-T Maul S Wespi E Broman S Kaufmann V Patpanathapillai I Treyer A A Navarini L V Maul

J Eur Acad Dermatol Venereol

Department of Dermatology, University Hospital Basel, Basel, Switzerland.

Published: May 2024

The study explores how well two commercial convolutional neural networks (CNNs) assess melanoma risk in real-world dermoscopic images compared to experienced dermatologists.* -
Conducted at the University Hospital Basel, the research involved analyzing 117 image sets of skin lesions to compare and evaluate the assessment reliability between the two CNNs using variation and correlation metrics.* -
Results showed that CNN-1 was more consistent in identifying clinically benign lesions with cancerous risk, while CNN-2 excelled with benign-scored lesions; both struggled with lesions that had conflicting risk assessments.*

Background: Deep-learning convolutional neural networks (CNNs) have outperformed even experienced dermatologists in dermoscopic melanoma detection under controlled conditions. It remains unexplored how real-world dermoscopic image transformations affect CNN robustness.

Objectives: To investigate the consistency of melanoma risk assessment by two commercially available CNNs to help formulate recommendations for current clinical use.

Methods: A comparative cohort study was conducted from January to July 2022 at the Department of Dermatology, University Hospital Basel. Five dermoscopic images of 116 different lesions on the torso of 66 patients were captured consecutively by the same operator without deliberate rotation. Classification was performed by two CNNs (CNN-1/CNN-2). Lesions were divided into four subgroups based on their initial risk scoring and clinical dignity assessment. Reliability was assessed by variation and intraclass correlation coefficients. Excisions were performed for melanoma suspicion or two consecutively elevated CNN risk scores, and benign lesions were confirmed by expert consensus (n = 3).

Results: 117 repeated image series of 116 melanocytic lesions (2 melanomas, 16 dysplastic naevi, 29 naevi, 1 solar lentigo, 1 suspicious and 67 benign) were classified. CNN-1 demonstrated superior measurement repeatability for clinically benign lesions with an initial malignant risk score (mean variation coefficient (mvc): CNN-1: 49.5(±34.3)%; CNN-2: 71.4(±22.5)%; p = 0.03), while CNN-2 outperformed for clinically benign lesions with benign scoring (mvc: CNN-1: 49.7(±22.7)%; CNN-2: 23.8(±29.3)%; p = 0.002). Both systems exhibited lowest score consistency for lesions with an initial malignant risk score and benign assessment. In this context, averaging three initial risk scores achieved highest sensitivity of dignity assessment (CNN-1: 94%; CNN-2: 89%). Intraclass correlation coefficients indicated 'moderate'-to-'good' reliability for both systems (CNN-1: 0.80, 95% CI:0.71-0.87, p < 0.001; CNN-2: 0.67, 95% CI:0.55-0.77, p < 0.001).

Conclusions: Potential user-induced image changes can significantly influence CNN classification. For clinical application, we recommend using the average of three initial risk scores. Furthermore, we advocate for CNN robustness optimization by cross-validation with repeated image sets.

Trial Registration: ClinicalTrials.gov (NCT04605822).

Download full-text PDF	Source
http://dx.doi.org/10.1111/jdv.19777	DOI Listing

Publication Analysis

Top Keywords

benign lesions

convolutional neural

neural networks

dermoscopic melanoma

initial risk

dignity assessment

intraclass correlation

correlation coefficients

risk scores

clinically benign

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!