We derive a general structure that encompasses important coefficients of interrater agreement such as the S-coefficient, Cohen's kappa, Scott's pi, Fleiss' kappa, Krippendorff's alpha, and Gwet's AC1. We show that these coefficients share the same set of assumptions about rater behavior; they only differ in how the unobserved category proportions are estimated. We incorporate Bayesian estimates of the category proportions and propose a new agreement coefficient with uniform prior beliefs. To correct for guessing in the process of item classification, the new coefficient emphasizes equal category probabilities if the observed frequencies are unstable due to a small sample, and the frequencies increasingly shape the coefficient as they become more stable. The proposed coefficient coincides with the S-coefficient for the hypothetical case of zero items; it converges to Scott's pi, Fleiss' kappa, and Krippendorff's alpha as the number of items increases. We use simulation to show that the proposed coefficient is as good as extant coefficients if the category proportions are equal and that it performs better if the category proportions are substantially unequal. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1037/met0000183 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!