https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=19000873&retmode=xml&tool=Litmetric&email=readroberts32@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09 190008732009012320211020
1878-404615122008DecAcademic radiologyAcad RadiolAgreement of the order of overall performance levels under different reading paradigms.156715731567-7310.1016/j.acra.2008.07.011To investigate consistency of the orders of performance levels when interpreting mammograms under three different reading paradigms.We performed a retrospective observer study in which nine experienced radiologists rated an enriched set of mammography examinations that they personally had read in the clinic ("individualized") mixed with a set that none of them had read in the clinic ("common set"). Examinations were interpreted under three different reading paradigms: binary using screening Breast Imaging Reporting and Data System (BI-RADS), receiver-operating characteristic (ROC), and free-response ROC (FROC). The performance in discriminating between cancer and noncancer findings under each of the paradigms was summarized using Youden's index/2+0.5 (Binary), nonparameteric area under the ROC curve (AUC), and an overall FROC index (JAFROC-2). Pearson correlation coefficients were then computed to assess consistency in the ordering of observers' performance levels. Statistical significance of the computed correlation coefficients was assessed using bootstrap confidence intervals obtained by resampling sets of examination-specific observations.All but one of the computed pair-wise correlation coefficients were larger than 0.66 and were significantly different from zero. The correlation between the overall performance measures under the Binary and ROC paradigms was the lowest (0.43) and was not significantly different from zero (95% confidence interval -0.078 to 0.733).The use of different evaluation paradigms in the laboratory tends to lead to consistent ordering of the overall performance levels of observers. However, one should recognize that conceptually similar performance indexes resulting from different paradigms often measure different performance characteristics and thus disagreements are not only possible but frequently quite natural.GurDavidDDepartment of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Pittsburgh, PA 15213-3180, USA. gurd@upmc.eduBandosAndriy IAIKlymAmy HAHCohenCathy SCSHakimChristiane MCMHardestyLara ALAGanottMarie AMAPerrinRonald LRLPollerWilliam RWRShahRatanRSumkinJules HJHWallaceLuisa PLPRocketteHoward EHEengR01 EB006388EBNIBIB NIH HHSUnited StatesR01 EB002106-13EBNIBIB NIH HHSUnited StatesR01 EB002106EBNIBIB NIH HHSUnited StatesEB006388EBNIBIB NIH HHSUnited StatesEB00350EBNIBIB NIH HHSUnited StatesEB002106EBNIBIB NIH HHSUnited StatesR01 EB006388-03EBNIBIB NIH HHSUnited StatesR01 EB003503-04EBNIBIB NIH HHSUnited StatesR01 EB003503EBNIBIB NIH HHSUnited StatesJournal ArticleResearch Support, N.I.H., Extramural
United StatesAcad Radiol94401591076-6332IMBreast Neoplasmsdiagnostic imagingData Interpretation, StatisticalFemaleHumansImage Interpretation, Computer-AssistedmethodsMammographymethodsObserver VariationProfessional CompetenceROC CurveReproducibility of ResultsSensitivity and SpecificityTask Performance and Analysis
200861020087152008715200811139020091249020081113902009121ppublish19000873NIHMS80133PMC260162610.1016/j.acra.2008.07.011S1076-6332(08)00411-XDeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the area under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–845.3203132Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol. 1992;27(9):723–731.1399456Obuchowski NA, Rockette HE. Hypothesis testing of the diagnostic accuracy for multiple diagnostic tests: an ANOVA approach with dependent observations. Communications Statistics Simulations Computations. 1995;24:285–308.Beiden SV, Wagner RF, Campbell G. Components of variance models and multiple bootstrap experiments: An alternative method for random effects, receiver operating characteristics analysis. Acad Radiol. 2000;7:341–349.10803614Ishwaran H, Gatsonis CA. A general class of hierarchical ordinal regression models with applications to correlated ROC analysis. The Canadian Journal of Statistics. 2000;28:731–750.Wagner RF, Beiden SV, Campbell G, Metz CE, Sacks WM. Assessment of medical imaging and computer-assist systems: lessons from recent experience. Acad Radiol. 2002;9(11):1264–1277.12449359Obuchowski NA, Beiden SV, Berbaum KS, et al. Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol. 2004;11(9):980–995.15350579Bandos AI, Rockette HE, Gur D. A permutation test for comparing ROC curves in multireader studies. Acad Radiol. 2006;13:414–420.16554220Gallas B. One-shot estimate of MRMC variance: AUC. Acad Radiol. 2006;13:353–362.16488848Bandos AI, Rockette HE, Gur D. Exact bootstrap variances of the area under the ROC curve. Communications in Statistics – Theory & Methods. 2007;36(13):2443–2461.Gur D, Rockette HE, Armfield DR, et al. Prevalence effect in a laboratory environment. Radiology. 2003;228(1):10–14.12832568Shah SK, McNitt-Gray MF, De Zoysa KR, et al. Solitary pulmonary nodule diagnosis on CT: results of an observer study. Acad Radiol. 2005;12(4):496–501.15831424Skaane P, Balleyguier C, Diekmann F, et al. Breast Lesion Detection and Classification: Comparison of Screen-Film Mammography and Full-Field Digital Mammography with Soft-copy Reading--Observer Performance Study. Radiology. 2005;237(1):37–44.16100086Shiraishi J, Abe H, Li F, Engelmann R, MacMahon H, Doi K. Computer-aided diagnosis for the detection and classification of lung cancers on chest radiographs ROC analysis of radiologists’ performance. Acad Radiol. 2006;13(8):995–1003.16843852Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol. 2007;14(6):723–748.17502262Zheng B, Chakraborty DP, Rockette HE, Maitz GS, Gur D. A comparison of two data analyses from two observer performance studies using Jackknife ROC and JAFROC. Med Phys. 2005;32(4):1031–4.15895587Gur D, Rockette HE, Bandos AI. “Binary” and “Non-Binary” Detection Tasks: Are Current Performance Measures Optimal? Acad Radiol. 2007;14(7):871–876.17626312Wagner RF, Beiden SV, Metz CE. Continuous versus categorical data for ROC analysis: some quantitative considerations. Acad Radiol. 2001;8(4):328–334.11293781Metz CE, Shen JH. Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Med Decis Making. 1992;12(1):60–75.1538634Swensson RG, King JL, Good WF, Gur D. Observer variation and the performance accuracy gained by averaging ratings of abnormality. Med Phys. 2000;27(8):1920–33.10984238Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists’ interpretations of mammograms. N Engl J Med. 1994;331(22):1493–1499.7969300Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. Arch Intern Med. 1996;156(2):209–213.8546556Elmore JG, Wells CK, Howard DH. Does diagnostic accuracy in mammography depend on radiologists’ experience? J Women’s Health. 1998;7:443–449.9611702Esserman L, Cowley H, Eberle C, et al. Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst. 2002;94:369–375.11880475Beam CA, Conant EF, Sickles EA. Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. J Natl Cancer Inst. 2003;95(4):282–290.12591984Beam CA, Conant EF, Sickles EA. Factors affecting radiologists inconsistency in screening mammography. Acad Radiol. 2002;9:531–540.12458879Eng J. Receiver operating characteristic analysis: a primer. Acad Radiol. 2005;12(7):909–916.16039544Berbaum KS, Dorfman DD, Franken EA, Jr, Caldwell RT. An empirical comparison of discrete ratings and subjective probability ratings. Acad Radiol. 2002;9(7):756–763.12139089Rockette HE, Gur D, Metz CE. The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques. Investigative Radiology. 1992;27(2):169–172.1601610Gur D, Bandos AI, King JL, et al. Binary and Multi-Category Ratings in a Laboratory Observer Performance Study: A Comparison. Med Phys. submitted.PMC262751018975686Hilden J. Regret graphs, diagnostic uncertainty and Youden’s index. Statistics in Medicine. 1996;15:969–986.8783436Hanley JA, McNeil BJ. The meaning and use of the Area under Receiver Operating Characteristic (ROC) Curve. Radiology. 1982;143:29–36.7063747Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys. 2004;31(8):2313–2330.15377098Gur D, Bandos AI, Cohen CS, et al. The “Laboratory” Effect: Comparing Radiologists’ Performance and Variability during Clinical Prospective and Laboratory Mammography Interpretations. Radiology. in press.PMC260719418682584American College of Radiology (ACR) Breast imaging reporting and data system atlas (BI-RADS atlas) Reston (VA): American College of Radiology; 2003. [Accessed October 1, 2003]. Available at: http://www.acr.org/SecondaryMainMenuCategories/quality_safety/BIRADSAtlas.aspx.Youden WJ. An index for rating diagnostic tests. Cancer. 1950;3:32–35.15405679Chakraborty D, Yoon HJ, Mello-Thoms C. Spatial localization accuracy of radiologists in free-response studies: Inferring perceptual FROC curves from mark-rating data. Acad Radiol. 2007;14(1):4–18.PMC182929817178361