On understanding reliability for diagnostic tests.

Interv Pain Med

The University of Newcastle, PO Box 431, East Maitland, NSW, 2323, Australia.

Published: August 2022

For professional practice to be responsible, any diagnostic tests used must be reliable. Therefore, the reliability of any diagnostic test needs to have been measured. The classical statistic for quantifying reliability is Kappa. Although Kappa can be promptly determined using a programmed calculator, using an algorithm to derive Kappa provides greater insight into what it is actually measuring and why. Kappa scores can be graded, with verbal descriptor applied to different grades. However, those descriptors do not necessarily reflect the degree of skill required to achieve different grades of Kappa. High levels of skill attract high Kappa scores, but Kappa scores described as fair or moderate are not necessarily flattering because they can be achieved with questionable levels of skill. Various corrections can be applied to the calculation of Kappa scores in order to raise their value, and to improve the verbal descriptors of their grade, but these may not be legitimate or necessary. Low Kappa scores do not condemn tests but they serve to raise questions about their reliability.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11372993PMC
http://dx.doi.org/10.1016/j.inpm.2022.100124DOI Listing

Publication Analysis

Top Keywords

kappa scores
20
kappa
9
reliability diagnostic
8
diagnostic tests
8
levels skill
8
scores
5
understanding reliability
4
tests professional
4
professional practice
4
practice responsible
4

Similar Publications

Sequential testing with Xpert MTB/RIF assay for diagnosis of tuberculous meningitis in Maharaj Nakorn Chiang Mai University Hospital.

Sci Rep

January 2025

Division of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Faculty of Medicine, Chiang Mai University, 110 Intavaroros Rd., Muaeng, Chiang Mai, 50200, Thailand.

Early diagnosis and appropriate treatment are essential for reducing morbidity and mortality in tuberculous meningitis (TBM). This study aimed to evaluate the diagnostic performance of the Xpert MTB/RIF assay for the diagnosis of TBM in patients with subacute lymphocytic meningitis. This cross-sectional study included 65 cerebrospinal fluid (CSF) specimens from patients at Maharaj Nakorn Chiang Mai University Hospital, Thailand, between January 2015 and March 2016.

View Article and Find Full Text PDF

The value of MRI in differentiating ovarian clear cell carcinoma from other adnexal masses with O-RADS MRI scores of 4-5.

Insights Imaging

January 2025

Department of Radiology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China.

Objective: To assess the utility of clinical and MRI features in distinguishing ovarian clear cell carcinoma (CCC) from adnexal masses with ovarian-adnexal reporting and data system (O-RADS) MRI scores of 4-5.

Methods: This retrospective study included 850 patients with indeterminate adnexal masses on ultrasound. Two radiologists evaluated all preoperative MRIs using the O-RADS MRI risk stratification system.

View Article and Find Full Text PDF

Background: Protein-energy wasting (PEW) is the chronic kidney disease (CKD)-specific diagnosis encompassing malnutrition. PEW is associated with adverse outcomes, including those receiving peritoneal dialysis (PD). Identifying PEW requires accurate methods to improve diagnosis.

View Article and Find Full Text PDF

Background: Patient experience is a fundamental element of colonoscopy. The Gloucester Comfort Scale (GCS) is used by clinicians to report patient comfort. However, insights regarding the extent to which clinician-reported GCS scores represent the patient's experience are lacking.

View Article and Find Full Text PDF

Purpose: The study aimed to identify and assess the methodological quality of essential clinical guidelines for the management of laryngitis and pharyngitis.

Methods: A systematic search of clinical guidelines for the management of laryngitis and pharyngitis was performed in three databases. Methodological quality was assessed according to AGREE II, in which each item in its domains was scored by four independent evaluators.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!