Need for Objective Task-Based Evaluation of Image Segmentation Algorithms for Quantitative PET: A Study with ACRIN 6668/RTOG 0235 Multicenter Clinical Trial Data.

Ziping Liu Joyce C Mhlanga Huitian Xia Barry A Siegel Abhinav K Jha

J Nucl Med

Department of Biomedical Engineering, Washington University, St. Louis, Missouri;

Published: March 2024

Reliable performance of PET segmentation algorithms on clinically relevant tasks is required for their clinical translation. However, these algorithms are typically evaluated using figures of merit (FoMs) that are not explicitly designed to correlate with clinical task performance. Such FoMs include the Dice similarity coefficient (DSC), the Jaccard similarity coefficient (JSC), and the Hausdorff distance (HD). The objective of this study was to investigate whether evaluating PET segmentation algorithms using these task-agnostic FoMs yields interpretations consistent with evaluation on clinically relevant quantitative tasks. We conducted a retrospective study to assess the concordance in the evaluation of segmentation algorithms using the DSC, JSC, and HD and on the tasks of estimating the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumors from PET images of patients with non-small cell lung cancer. The PET images were collected from the American College of Radiology Imaging Network 6668/Radiation Therapy Oncology Group 0235 multicenter clinical trial data. The study was conducted in 2 contexts: (1) evaluating conventional segmentation algorithms, namely those based on thresholding (SUV40% and SUV50%), boundary detection (Snakes), and stochastic modeling (Markov random field-Gaussian mixture model); (2) evaluating the impact of network depth and loss function on the performance of a state-of-the-art U-net-based segmentation algorithm. Evaluation of conventional segmentation algorithms based on the DSC, JSC, and HD showed that SUV40% significantly outperformed SUV50%. However, SUV40% yielded lower accuracy on the tasks of estimating MTV and TLG, with a 51% and 54% increase, respectively, in the ensemble normalized bias. Similarly, the Markov random field-Gaussian mixture model significantly outperformed Snakes on the basis of the task-agnostic FoMs but yielded a 24% increased bias in estimated MTV. For the U-net-based algorithm, our evaluation showed that although the network depth did not significantly alter the DSC, JSC, and HD values, a deeper network yielded substantially higher accuracy in the estimated MTV and TLG, with a decreased bias of 91% and 87%, respectively. Additionally, whereas there was no significant difference in the DSC, JSC, and HD values for different loss functions, up to a 73% and 58% difference in the bias of the estimated MTV and TLG, respectively, existed. Evaluation of PET segmentation algorithms using task-agnostic FoMs could yield findings discordant with evaluation on clinically relevant quantitative tasks. This study emphasizes the need for objective task-based evaluation of image segmentation algorithms for quantitative PET.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10924158	PMC
http://dx.doi.org/10.2967/jnumed.123.266018	DOI Listing

Publication Analysis

Top Keywords

segmentation algorithms

dsc jsc

pet segmentation

clinically relevant

task-agnostic foms

mtv tlg

estimated mtv

segmentation

algorithms

objective task-based

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!