Measurement of inter-rater agreement for transient events using Monte Carlo sampled permutations.

Stat Med

Division of Pulmonary and Critical Care Medicine, School of Medicine, New York University, NY 10016, USA.

Published: February 2007

In this paper we demonstrate the adverse effect of serially observed data sequences containing transient events on the calculation of Cohen's kappa as an index of inter-rater agreement in the detection of these events. We develop and use a Monte-Carlo-based permutation technique to produce an empiric distribution of kappa in the presence of serial dependence. We find that the empiric confidence intervals for kappa tend to be wider than parametrically derived intervals and in the case of longer event lengths, are markedly so. We evaluate the effect of number and length of events, and further, describe and evaluate three permutation methods which match specific rating situations. Finally, we apply these techniques to the measurement of inter-rater agreement for sleep disordered breathing events, a transient event identified during nocturnal polysomnography, for which traditionally computed confidence intervals for kappa are incorrect.

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.2568DOI Listing

Publication Analysis

Top Keywords

inter-rater agreement
12
measurement inter-rater
8
transient events
8
confidence intervals
8
intervals kappa
8
events
5
agreement transient
4
events monte
4
monte carlo
4
carlo sampled
4

Similar Publications

Objective: This study evaluated ResNet-50 and U-Net models for detecting and segmenting vertical misfit in dental implant crowns using periapical radiographic images.

Methods: Periapical radiographs of dental implant crowns were classified by two experts based on the presence of vertical misfit (reference group). The misfit area was manually annotated in images exhibiting vertical misfit.

View Article and Find Full Text PDF

ChatGPT and oral cancer: a study on informational reliability.

BMC Oral Health

January 2025

Faculty of Dentistry, Department of Dentomaxillofacial Radiology, Tokat Gaziosmanpasa University, Tokat, Turkey.

Background: Artificial intelligence (AI) and large language models (LLMs) like ChatGPT have transformed information retrieval, including in healthcare. ChatGPT, trained on diverse datasets, can provide medical advice but faces ethical and accuracy concerns. This study evaluates the accuracy of ChatGPT-3.

View Article and Find Full Text PDF

Objective: Coronal malalignment is a common feature of adult spinal deformity, and accurate classification is essential for diagnosis and treatment planning. However, variations in interpretation among clinicians can impact classification consistency. By assessing the reliability and applicability of these systems across different medical experts, this study seeks to establish a standardized approach to enhance clinical outcomes.

View Article and Find Full Text PDF

Introduction: The Scale for Assessment and Rating of Ataxia (SARA) is a widely used clinical rating scale in ataxia. Remote video assessments of SARA examinations are increasingly used to reduce variability through centralized ratings. Remote video assessments have a high agreement with in-person ratings, but the intra- and inter-rater reliability of remote video ratings has not been examined.

View Article and Find Full Text PDF

Dietary Food Record Charts and Digital Photography effectively estimate hospital meal consumption.

Clin Nutr ESPEN

January 2025

Division of Human Nutrition and Health, Nutritional Biology, Wageningen University & Research, HELIX (Building 124), Stippeneng 4, 6708 WE Wageningen, The Netherlands; Department of Intensive Care Medicine & Research, Gelderse Vallei Hospital, Willy Brandtlaan 10, 6716 RP Ede, The Netherlands. Electronic address:

Background & Aims: Optimal nutritional intake is essential to support nutritional status and improve recovery in hospital patients. To monitor adequate food intake in patients, reliable and accessible methods to quantify patient food intake accurately are needed. The present study aims to compare the accuracy of two methods, Food Record Charts (FRCs) and Digital Photography (DP), in estimating food intake with the gold standard of Weighed Food Records (WFRs).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!