In this paper, we present and discuss two new measures of inter- and intra-rater agreement to assess the reliability of the raters, and hence of their labeling, in multi-rater setings, which are common in the production of ground truth for machine learning models. Our proposal is more conservative of other existing agreement measures, as it considers a more articulated notion of agreement by chance, based on an empirical estimation of the precision (or reliability) of the single raters involved. We discuss the measures in light of a realistic annotation tasks that involved 13 expert radiologists in labeling the MRNet dataset.

Download full-text PDF

Source
http://dx.doi.org/10.3233/SHTI200167DOI Listing

Publication Analysis

Top Keywords

measures inter-
8
inter- intra-rater
8
intra-rater agreement
8
agreement assess
8
assess reliability
8
ground truth
8
discuss measures
8
introducing measures
4
agreement
4
reliability medical
4

Similar Publications

This investigation evaluated validity and reliability of the HUMAC360 linear position transducer (LPT) compared to the Tendo Sport Weightlifting Analyzer (TENDO) for measuring mean velocity (MV), peak velocity (PV), and displacement (D) during the bench press. Seventeen recreationally active individuals completed three visits. During visit one, participants were assessed for their one repetition maximum (1RM) bench press.

View Article and Find Full Text PDF

Introduction: Foot ulcers are one of the most serious complications of diabetes, leading to significant risks on amputation and mortality. Peripheral arterial disease (PAD) is an important factor for the development and the outcome of diabetic foot ulcers (DFU). Although prompt and accurate detection of PAD is critical to reduce complications, its diagnosis can be challenging with currently used bedside tests (such as ankle-brachial index and toe pressure) due to medial arterial calcification.

View Article and Find Full Text PDF

Background: In 2018, a nationwide survey carried out in 387 acute care hospitals from 16 out of 21 Italian regions, allowed defining an extended checklist for the participatory evaluation of person-centredness in hospital care. We aimed to validate a reduced set of core items for continuous use across the country.

Methods: Factor analysis was used to validate the construct of the checklist.

View Article and Find Full Text PDF

Optimizing hip MRI: enhancing image quality and elevating inter-observer consistency using deep learning-powered reconstruction.

BMC Med Imaging

January 2025

Department of Magnetic Resonance Imaging, The First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450052, China.

Background: Conventional hip joint MRI scans necessitate lengthy scan durations, posing challenges for patient comfort and clinical efficiency. Previously, accelerated imaging techniques were constrained by a trade-off between noise and resolution. Leveraging deep learning-based reconstruction (DLR) holds the potential to mitigate scan time without compromising image quality.

View Article and Find Full Text PDF

Purpose: There is a growing interest in using computed tomography (CT) scans to opportunistically assess bone mineral density via Hounsfield units (HU). Previous studies have shown lower HU in patients with vertebral compression fractures (VCFs) and that HU can predict pre-existing VCFs. This study evaluated whether HU from CT scans can predict the number of prevalent VCFs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!