Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes.

Adv Health Sci Educ Theory Pract

School of Medicine, Leeds Institute of Medical Education, University of Leeds, LS29JT, Leeds, UK.

Published: May 2022

Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates-although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117341PMC
http://dx.doi.org/10.1007/s10459-022-10096-9DOI Listing

Publication Analysis

Top Keywords

examiner stringency
16
variation examiner
12
domain scores
12
assessments performance
8
scores grades
8
borderline regression
8
examiner
6
stringency
6
domain
5
pass/fail decisions
4

Similar Publications

This study evaluates mathematical tools (principal component analysis, dynamic time warping, and the Kolmogorov-Smirnov hypothesis test) to analyse global and local data from dynamic signatures to reduce subjectivity and increase the reproducibility of handwriting examination using a two-step approach. A dataset composed of 1 800 genuine signature samples, 870 simulated signatures, and 60 disguises (30 formally similar or "autosimulated" and 30 random but different from usual) provided by 30 volunteers was collected. The first step involved global data analysis using principal component analysis and a hypothesis test performed for 62 global characteristics, and associations of these characteristics were analysed through calculations of multivariate distance followed by a hypothesis test.

View Article and Find Full Text PDF

Background: Recent advancements in understanding plasma extracellular vesicles (EVs) and their role in disease biology have provided additional unique insights into the study of Colorectal Cancer (CRC).

Methods: This study aimed to gain biological insights into disease progression from plasma-derived extracellular vesicle proteomic profiles of 80 patients (20 from each CRC stage I-IV) against 20 healthy age- and sex-matched controls using a high-resolution SWATH-MS proteomics with a reproducible centrifugation method to isolate plasma EVs.

Results: We applied the High-Stringency Human Proteome Project (HPP) guidelines for SWATH-MS analysis, which refined our initial EV protein identification from 1362 proteins (10,993 peptides) to a more reliable and confident subset of 853 proteins (6231 peptides).

View Article and Find Full Text PDF

Introduction: Ensuring examiner equivalence across distributed assessment locations is a priority within distributed Objective Structured Clinical Exams (OSCEs) but is challenging as examiners are typically fully nested within locations (i.e. no overlap in performances seen by different groups of examiners).

View Article and Find Full Text PDF

Climate change dynamics for global energy security and equity: Evidence from policy stringency drivers.

J Environ Manage

November 2024

CUNY- Brooklyn College and the Graduate Center, New York, USA; Ateneo de Manila University School of Government, Manila, The Philippines; University of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam. Electronic address:

This study investigates the dynamic interplay between financial integration, political stability, infrastructure, and global integration in enhancing Energy Security (ES) and Energy Equity (EE) across 50 economies from 2006 to 2018. It addresses gaps in understanding how socio-economic, political, and technological factors collectively influence ES and EE during the global transition from fossil fuels to renewable energy sources. The research aims to reveal the complex relationships and potential trade-offs between energy sustainability, economic growth, and equitable energy distribution.

View Article and Find Full Text PDF

Influence of pairing in examiner leniency and stringency ('hawk-dove effect') in part II of the European Diploma of Anaesthesiology and Intensive Care: A cohort study.

Eur J Anaesthesiol

December 2024

From the Department of Anaesthesia, ITU and Pain Management, Mater Dei Hospital, Msida, Malta (SS), Department of Anaesthesiology, Erasmus University Medical Centre, Rotterdam, the Netherlands (MK), European Society of Anaesthesiology and Intensive Care, Brussels, Belgium (MK, BA, HS, RDL, JBE), Department of Anaesthesia, University Hospital of Wales, Cardiff, UK (BA), Institute for Medical Education, University of Bern, Bern, Switzerland (JBE), CINTESIS@RISE - Centre for Health Technology and Services Research, Porto, Portugal (JBE) and Institute of Anaesthesiology and Intensive Care, Salemspital, Hirslanden Medical Group, Bern, Switzerland (JBE).

Article Synopsis
  • The study examines the impact of examiner pairing on grading variances in the EDAIC Part II examination, focusing on the leniency and strictness of different examiner pairs.
  • Utilizing data from 325 examiners over three years, the research reveals that most examiner pairs had only slight differences in scoring, indicating a general consistency in leniency.
  • The findings highlight the potential 'hawk-dove effect', suggesting that different examiner combinations can significantly affect candidate performance and outcomes in the exam.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!