A method for identifying extreme OSCE examiners.

Clin Teach

Evaluation Bureau, Medical Council of Canada, Ottawa, Ontario K1G 5A2, Canada.

Published: February 2013

Background: Performance assessments rely on human judgment, and are vulnerable to rater effects (e.g. leniency or harshness). Making valid inferences from performance ratings for high-stakes decisions requires the management of rater effects. A simple method for detecting extreme raters that does not require sophisticated statistical knowledge or software has been developed as part of the quality assurance process for objective structured clinical examinations (OSCEs). We believe it is applicable to a range of examinations that rely on human raters.

Methods: The method has three steps. First, extreme raters are identified by comparing individual rater means with the mean of all raters. A rater is deemed extreme if their mean was three standard deviations below (hawks) or above (doves) the overall mean. This criterion is adjustable. Second, the distribution of an extreme rater's scores was compared with the overall distribution for the station. This step mitigates a station effect. Third, the cohort of candidates seen by the rater is examined to ensure that any cohort effect is ruled out.

Results And Implications: Of 3000+ raters, fewer than 0.3% have been identified as being extreme using the proposed criteria. Rater performance is being monitored on a regular basis, and the impact of these raters on candidate results will be considered before results are finalised. Extreme raters are contacted by the organisation to review their rating style. If this intervention fails to modify the rater's scoring pattern, the rater is no longer invited back. As more data are collected the organisation will assess them to inform the development of approaches to improve extreme rater performance.

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1743-498X.2012.00607.xDOI Listing

Publication Analysis

Top Keywords

extreme raters
12
extreme
8
rely human
8
rater
8
rater effects
8
rater performance
8
raters
6
method identifying
4
identifying extreme
4
extreme osce
4

Similar Publications

Background: Is "eyeballing" enough to determine cup malposition on anteroposterior (AP) pelvis radiographs before revision total hip arthroplasty (rTHA) for instability? We aimed to determine the following: (1) the reliability of eyeballing cup inclination/anteversion on AP pelvis radiographs vs geometrical measurements and (2) whether visual assessments are affected by surgeon experience.

Materials And Methods: Fifteen de-identified standing AP pelvis radiographs obtained before rTHA for instability were evaluated by one orthopedic surgeon who measured inclination/anteversion of the cups (n=15) using a new simplified method based on basic geometry. Subsequently, 4 orthopedic surgeons and 4 fellows (postgraduate year 6) assessed inclination/anteversion by eyeballing.

View Article and Find Full Text PDF

Background: The evaluation of lumbar spine degeneration on magnetic resonance imaging (MRI) is prone to inter-reader variability, including when assessing foraminal changes. This variability, often due to subjective criteria and inconsistent terminology, may affect clinical correlations. Standardized criteria could help improve agreement among readers.

View Article and Find Full Text PDF

Background: Research into the aetiology of spinal pain has shown a clear tendency towards a sensorimotor control perspective. In contrast to the lumbar spine, the available motor control tests for the cervical spine are extremely varied, little studied and sometimes very costly.

Objective: Review the quality and choice of the available cervical spine motor control tests.

View Article and Find Full Text PDF

Objective: Tools for classifying adverse drug reactions (ADRs) have not yet been validated in the context of the neonatal intensive care unit (NICU). The study aims to investigate the inter-rater reliability of the Hartwig tool and the Liverpool avoidability assessment tool (LAAT) in assessing the severity and avoidability of ADR cases in hospitalized neonates.

Methods: An observational and prospective study was conducted in the NICU of a maternity hospital in Natal, Brazil.

View Article and Find Full Text PDF
Article Synopsis
  • - A study developed an adaptive threshold algorithm for detecting burst patterns in EEG recordings of preterm infants to improve analysis accuracy in clinical settings.
  • - The algorithm was tested on 30 real-world EEG recordings, showing a substantial interrater agreement with a kappa score of 0.73, indicating its reliability compared to human raters.
  • - The algorithm achieved high sensitivity (0.90) and specificity (0.95), demonstrating its effectiveness as a valuable tool for automated burst detection in preterm infant EEGs.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!