Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs.

Adv Health Sci Educ Theory Pract

School of Medicine, University of Leeds, Leeds, LS2 JT, UK.

Published: July 2024

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades-with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades-to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways-via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208245PMC
http://dx.doi.org/10.1007/s10459-023-10289-wDOI Listing

Publication Analysis

Top Keywords

examiner stringency
12
examiner
8
understanding examiner
8
candidate station
8
allowed vary
8
score variance
8
station-level candidate
8
stringency
5
candidate
5
nuanced conceptualisation
4

Similar Publications

Background: Recent advancements in understanding plasma extracellular vesicles (EVs) and their role in disease biology have provided additional unique insights into the study of Colorectal Cancer (CRC).

Methods: This study aimed to gain biological insights into disease progression from plasma-derived extracellular vesicle proteomic profiles of 80 patients (20 from each CRC stage I-IV) against 20 healthy age- and sex-matched controls using a high-resolution SWATH-MS proteomics with a reproducible centrifugation method to isolate plasma EVs.

Results: We applied the High-Stringency Human Proteome Project (HPP) guidelines for SWATH-MS analysis, which refined our initial EV protein identification from 1362 proteins (10,993 peptides) to a more reliable and confident subset of 853 proteins (6231 peptides).

View Article and Find Full Text PDF

Introduction: Ensuring examiner equivalence across distributed assessment locations is a priority within distributed Objective Structured Clinical Exams (OSCEs) but is challenging as examiners are typically fully nested within locations (i.e. no overlap in performances seen by different groups of examiners).

View Article and Find Full Text PDF

Climate change dynamics for global energy security and equity: Evidence from policy stringency drivers.

J Environ Manage

November 2024

CUNY- Brooklyn College and the Graduate Center, New York, USA; Ateneo de Manila University School of Government, Manila, The Philippines; University of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam. Electronic address:

This study investigates the dynamic interplay between financial integration, political stability, infrastructure, and global integration in enhancing Energy Security (ES) and Energy Equity (EE) across 50 economies from 2006 to 2018. It addresses gaps in understanding how socio-economic, political, and technological factors collectively influence ES and EE during the global transition from fossil fuels to renewable energy sources. The research aims to reveal the complex relationships and potential trade-offs between energy sustainability, economic growth, and equitable energy distribution.

View Article and Find Full Text PDF

Influence of pairing in examiner leniency and stringency ('hawk-dove effect') in part II of the European Diploma of Anaesthesiology and Intensive Care: A cohort study.

Eur J Anaesthesiol

December 2024

From the Department of Anaesthesia, ITU and Pain Management, Mater Dei Hospital, Msida, Malta (SS), Department of Anaesthesiology, Erasmus University Medical Centre, Rotterdam, the Netherlands (MK), European Society of Anaesthesiology and Intensive Care, Brussels, Belgium (MK, BA, HS, RDL, JBE), Department of Anaesthesia, University Hospital of Wales, Cardiff, UK (BA), Institute for Medical Education, University of Bern, Bern, Switzerland (JBE), CINTESIS@RISE - Centre for Health Technology and Services Research, Porto, Portugal (JBE) and Institute of Anaesthesiology and Intensive Care, Salemspital, Hirslanden Medical Group, Bern, Switzerland (JBE).

Article Synopsis
  • The study examines the impact of examiner pairing on grading variances in the EDAIC Part II examination, focusing on the leniency and strictness of different examiner pairs.
  • Utilizing data from 325 examiners over three years, the research reveals that most examiner pairs had only slight differences in scoring, indicating a general consistency in leniency.
  • The findings highlight the potential 'hawk-dove effect', suggesting that different examiner combinations can significantly affect candidate performance and outcomes in the exam.
View Article and Find Full Text PDF

Background: High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners.

Methods: This retrospective, de-identified analysis of CICM examination data used descriptive statistics and generalisability theory to evaluate the reliability of the Hot Case examination component.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!