Publications by authors named "Matt Homer"

Summative assessments are often underused for feedback, despite them being rich with data of students' applied knowledge and clinical and professional skills. To better inform teaching and student support, this study aims to gain insights from summative assessments through profiling students' performance patterns and identify those students missing the basic knowledge and skills in medical specialities essential for their future career. We use Latent Profile Analysis to classify a senior undergraduate year group (n = 295) based on their performance in applied knowledge test (AKT) and OSCE, in which items and stations are pre-classified across five specialities (e.

View Article and Find Full Text PDF

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency.

View Article and Find Full Text PDF

Introduction: Alongside the usual exam-level cut-score requirement, the use of a conjunctive minimum number of stations passed (MNSP) standard in OSCE-type assessments is common practice across some parts of the world. Typically, the MNSP is fixed in advance with little justification, and does not vary from one administration to another in a particular setting-which is not congruent to best assessment practice for high stakes examinations. In this paper, we investigate empirically four methods of setting such a standard in an examinee-centred (i.

View Article and Find Full Text PDF
Article Synopsis
  • The study examines the impact of two different station formats—interview and role-play—on the quality of Multiple Mini Interviews (MMIs) used for selecting candidates in medical schools.
  • Data from 11,761 applicants over 8 years was analyzed, focusing on scores, discrimination, and predictive validity based on candidates' subsequent performance in clerkships.
  • Results showed that while role-play stations had slightly lower scores than interview stations, both formats displayed similar psychometric properties, indicating that the choice of format does not significantly affect the evaluation quality of MMIs.
View Article and Find Full Text PDF

Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade.

View Article and Find Full Text PDF

Introduction: Many institutions require candidates to achieve a minimum number of OSCE stations passed (MNSP) in addition to the aggregate pass mark. The stated rationale is usually that this conjunctive standard prevents excessive degrees of compensation across an assessment. However, there is a lack of consideration and discussion of this practice in the medical education literature.

View Article and Find Full Text PDF

Variation in examiner stringency is an ongoing problem in many performance settings such as in OSCEs, and usually is conceptualised and measured based on scores/grades examiners award. Under borderline regression, the standard within a station is set using checklist/domain scores and global grades acting in combination. This complexity requires a more nuanced view of what stringency might mean when considering sources of variation of cut-scores in stations.

View Article and Find Full Text PDF

There has been a long-running debate about the validity of item-based checklist scoring of performance assessments like OSCEs. In recent years, the conception of a checklist has developed from its dichotomous inception into a more 'key-features' and/or chunked approach, where 'items' have the potential to become weighted differently, but the literature does not always reflect these broader conceptions. We consider theoretical, design and (clinically trained) assessor issues related to differential item weighting in checklist scoring of OSCEs stations.

View Article and Find Full Text PDF

Borderline regression (BRM) is considered problematic in small cohort OSCEs (e.g.  < 50), with institutions often relying on item-centred standard setting approaches which can be resource intensive and lack defensibility in performance tests.

View Article and Find Full Text PDF

Background: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores.

View Article and Find Full Text PDF

Introduction: In recent decades, there has been a move towards standardized models of assessment where all students sit the same test (e.g. OSCE).

View Article and Find Full Text PDF

Introduction: Many standard setting procedures focus on the performance of the "borderline" group, defined through expert judgments by assessors. In performance assessments such as Objective Structured Clinical Examinations (OSCEs), these judgments usually apply at the station level.

Methods And Results: Using largely descriptive approaches, we analyze the assessment profile of OSCE candidates at the end of a five year undergraduate medical degree program to investigate the consistency of the borderline group across stations.

View Article and Find Full Text PDF

Context: There is a growing body of research investigating assessor judgments in complex performance environments such as OSCE examinations. Post hoc analysis can be employed to identify some elements of "unwanted" assessor variance. However, the impact of individual, apparently "extreme" assessors on OSCE quality, assessment outcomes and pass/fail decisions has not been previously explored.

View Article and Find Full Text PDF

Introduction: It is known that test-centered methods for setting standards in knowledge tests (e.g. Angoff or Ebel) are problematic, with expert judges not able to consistently predict the difficulty of individual items.

View Article and Find Full Text PDF

With growing evidence for the positive health outcomes associated with a plant-based diet, the study's purpose was to examine the potential of shifting adolescents' food choices towards plant-based foods. Using a real world setting of a school canteen, a set of small changes to the choice architecture was designed and deployed in a secondary school in Yorkshire, England. Focussing on designated food items (whole fruit, fruit salad, vegetarian daily specials, and sandwiches containing salad) the changes were implemented for six weeks.

View Article and Find Full Text PDF

Background: The use of the borderline regression method (BRM) is a widely accepted standard setting method for OSCEs. However, it is unclear whether this method is appropriate for use with small cohorts (e.g.

View Article and Find Full Text PDF

Background: When measuring assessment quality, increasing focus is placed on the value of station-level metrics in the detection and remediation of problems in the assessment.

Aims: This article investigates how disparity between checklist scores and global grades in an Objective Structured Clinical Examination (OSCE) can provide powerful new insights at the station level whenever such disparities occur and develops metrics to indicate when this is a problem.

Method: This retrospective study uses OSCE data from multiple examinations to investigate the extent to which these new measurements of disparity complement existing station-level metrics.

View Article and Find Full Text PDF

This paper reports on a study that compares estimates of the reliability of a suite of workplace based assessment forms as employed to formatively assess the progress of trainee obstetricians and gynaecologists. The use of such forms of assessment is growing nationally and internationally in many specialties, but there is little research evidence on comparisons by procedure/competency and form-type across an entire specialty. Generalisability theory combined with a multilevel modelling approach is used to estimate variance components, G-coefficients and standard errors of measurement across 13 procedures and three form-types (mini-CEX, OSATS and CbD).

View Article and Find Full Text PDF