The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge: A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset.

Radiol Artif Intell

Departments of Radiology (A.D.D., G.E.G., B.A.H., A.S.C.) and Electrical Engineering (A.D.D., B.A.H.), Stanford University, Lucas Center for Imaging, 1201 Welch Rd, PS 055B, Stanford, CA 94305; Department of Radiology, University of California, San Francisco, San Francisco, Calif (F.C., C. Iriondo, V.P.); Berkeley Joint Graduate Group in Bioengineering, University of California, Berkeley, Berkeley, Calif (C. Iriondo); Department of Computer Science, University of Central Florida, Orlando, Fla (A.M., U.B.); Department of Radiology, Northwestern University, Chicago, Ill (U.B.); Department of Radiology, Columbia University, New York, NY (S.J.); Department of Computer Science, University of Copenhagen, Copenhagen, Denmark (M.P., C. Igel, E.B.D.); Department of Biomedical Engineering, Cleveland Clinic, Cleveland, Ohio (S.G., M.Y., X.L.); Department of Radiology, New York University Langone Health, New York, NY (C.M.D., R.R.); and Department of Biomedical Imaging and Image-guided Therapy, High-Field MR Centre, Medical University of Vienna, Vienna, Austria (V.J.).

Published: May 2021

Purpose: To organize a multi-institute knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression.

Materials And Methods: A dataset partition consisting of three-dimensional knee MRI from 88 retrospective patients at two time points (baseline and 1-year follow-up) with ground truth articular (femoral, tibial, and patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated against ground truth segmentations using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a holdout test set. Similarities in automated segmentations were measured using pairwise Dice coefficient correlations. Articular cartilage thickness was computed longitudinally and with scans. Correlation between thickness error and segmentation metrics was measured using the Pearson correlation coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives.

Results: Six teams ( - ) submitted entries for the challenge. No differences were observed across any segmentation metrics for any tissues ( = .99) among the four top-performing networks ( , , , ). Dice coefficient correlations between network pairs were high (> 0.85). Per-scan thickness errors were negligible among networks - ( = .99), and longitudinal changes showed minimal bias (< 0.03 mm). Low correlations (ρ < 0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top-performing networks ( = .99). Empirical upper-bound performances were similar for both combinations (P = .99).

Conclusion: Diverse networks learned to segment the knee similarly, where high segmentation accuracy did not correlate with cartilage thickness accuracy and voting ensembles did not exceed individual network performance.See also the commentary by Elhalawani and Mak in this issue. Cartilage, Knee, MR-Imaging, Segmentation © RSNA, 2020

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8231759PMC
http://dx.doi.org/10.1148/ryai.2021200078DOI Listing

Publication Analysis

Top Keywords

knee mri
12
segmentation metrics
12
segmentation
8
mri segmentation
8
segmentation challenge
8
ground truth
8
majority-vote ensemble
8
dice coefficient
8
coefficient correlations
8
cartilage thickness
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!