Comparison of Vendor-Pretrained and Custom-Trained Deep Learning Segmentation Models for Head-and-Neck, Breast, and Prostate Cancers.

Xinru Chen Yao Zhao Hana Baroudi Mohammad D El Basha Aji Daniel Skylar S Gay Cenji Yu He Wang Jack Phan Seungtaek L Choi Chelain R Goodman Xiaodong Zhang Joshua S Niedzielski Sanjay S Shete Laurence E Court Zhongxing Liao Fredrik Löfman Peter A Balter Jinzhong Yang

Diagnostics (Basel)

The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA.

Published: December 2024

Background/objectives: We assessed the influence of local patients and clinical characteristics on the performance of commercial deep learning (DL) segmentation models for head-and-neck (HN), breast, and prostate cancers.

Methods: Clinical computed tomography (CT) scans and clinically approved contours of 210 patients (53 HN, 49 left breast, 55 right breast, and 53 prostate cancer) were used to train and validate segmentation models integrated within a vendor-supplied DL training toolkit and to assess the performance of both vendor-pretrained and custom-trained models. Four custom models (HN, left breast, right breast, and prostate) were trained and validated with 30 (training)/5 (validation) HN, 34/5 left breast, 39/5 right breast, and 30/5 prostate patients to auto-segment a total of 24 organs at risk (OARs). Subsequently, both vendor-pretrained and custom-trained models were tested on the remaining patients from each group. Auto-segmented contours were evaluated by comparing them with clinically approved contours via the Dice similarity coefficient (DSC) and mean surface distance (MSD). The performance of the left and right breast models was assessed jointly according to ipsilateral/contralateral locations.

Results: The average DSCs for all structures in vendor-pretrained and custom-trained models were as follows: 0.81 ± 0.12 and 0.86 ± 0.11 in HN; 0.67 ± 0.16 and 0.80 ± 0.11 in the breast; and 0.87 ± 0.09 and 0.92 ± 0.06 in the prostate. The corresponding average MSDs were 0.81 ± 0.76 mm and 0.76 ± 0.56 mm (HN), 4.85 ± 2.44 mm and 2.42 ± 1.49 mm (breast), and 2.17 ± 1.39 mm and 1.21 ± 1.00 mm (prostate). Notably, custom-trained models showed significant improvements over vendor-pretrained models for 14 of 24 OARs, reflecting the influence of data/contouring variations in segmentation performance.

Conclusions: These findings underscore the substantial impact of institutional preferences and clinical practices on the implementation of vendor-pretrained models. We also found that a relatively small amount of institutional data was sufficient to train customized segmentation models with sufficient accuracy.