The Inter-Rater Reliability of Technical Skills Assessment and Retention of Rater Training.

J Surg Educ

Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada; Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, Ontario, Canada.

Published: July 2020

Background: The inter-rater reliability (IRR) of laparoscopic skills assessment is usually determined in the context of motivated raters from a single subspecialty practice group with significant experience using similar tools. The purpose of this study was to determine the IRR among attending surgeons of different experience and practices, the extent of rater training that is necessary to achieve good IRR, and if rater training is retained over periods of nonuse.

Methods: In Part 1, 5 surgeons of different practice backgrounds assessed 3 laparoscopic cholecystectomy videos using the Global Operative Assessment of Laparoscopic Skills instrument. In Part 2, 2 of the surgeons assessed a total of 33 videos over 5 scoring sessions distributed across 6 months. They participated in 2 different training sessions, and retention was tested in the other 3 sessions. IRR was calculated for Parts 1 and 2 with an intraclass correlation (ICC) in a 2-way random-effects model.

Results: The ICC for Part 1 was poor (ICC = 0.26). In Part 2, the ICC was highest after each training session (scoring #1 ICC = 0.76, scoring #3 ICC = 0.74). The ICC was not retained 1.5 months after the brief video-based training session (scoring #2 ICC = -0.17). The ICC was retained 2.5 months after the in-depth discussion training session (scoring #4 ICC = 0.70), but not 4.5 months later (scoring #5 ICC = 0.04).

Conclusions: Good IRR is not implicit among surgeons with varying backgrounds and experience. Good IRR can be achieved with different types of rater training, but the impact of rater training is lost in periods of nonuse. This suggests the need for further study of the IRR of technical skills assessment when performed by the wide variety of surgeon raters as is commonly encountered in the environment of postgraduate resident assessment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jsurg.2019.01.001DOI Listing

Publication Analysis

Top Keywords

rater training
20
skills assessment
12
good irr
12
training session
12
session scoring
12
training
9
inter-rater reliability
8
technical skills
8
laparoscopic skills
8
icc retained
8

Similar Publications

Introduction: This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.

Review Of Literature: The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.

View Article and Find Full Text PDF

Background: Motion-tracking has been shown to correlate with expert and novice performance but has not been used for skill development. For skill development, performance goals must be defined. We hypothesize that using wearable sensor technology, motion tracking outcomes can be identified in those deemed practice-ready and used as benchmarks for precision learning.

View Article and Find Full Text PDF

Purpose: The development of the Diabetic Wound Assessment Learning Tool (DiWALT) has previously been described. However, an examination of its application to a larger, more heterogeneous group of participants is lacking. In order to allow for a more robust assessment of the psychometric properties of the DiWALT, we applied it to a broader group of participants.

View Article and Find Full Text PDF

Introduction Debriefing in healthcare simulation is helpful in reinforcing learning objectives, closing performance gaps, and improving future practice and patient care. The Debriefing Assessment for Simulation in Healthcare (DASH) is a validated tool. However, localized rater training for the DASH has not been described.

View Article and Find Full Text PDF

Background: Social communication is a crucial factor influencing human social life. Quantifying the degree of difficulty faced in social communication is necessary for understanding developmental and neurological disorders and for creating systems used in automatic symptom screening and assistive methods such as social skills training (SST). SST by a human trainer is a well-established method.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!