Objective: The goals of this study were (1) to assess if examiner ratings in the American Board of Surgery (ABS) General Surgery Cetifying Exam (CE) are biased based on the gender, race, and ethnicity of the candidate or the examiners, and (2) if the format of delivering of the exams, in-person or virtual, affects how examiners rate candidates.
Design: We included every candidate-examiner combination for first time takers of the general surgery oral exam. Total scores and pass/fail outcomes based on the 4 scores given by examiners to candidates were analyzed using multilevel models, with candidates as random effects. Explanatory variables included the gender, race, and ethnicity of candidates and examiners, and the format of the exam (in-person or virtual). Candidates' first attempt scores on the ABS General Surgery Qualifying Exam (QE) were also included in the models to control for the baseline knowledge of the candidate. Three sets of models were evaluated for each demographic variable (gender, race, ethnicity) due to missingness in data. p-values and coefficients of determination R were used to quantify the statistical and practical significance of the model coefficients (an existent relationship between the explored variables on CE scores was considered statistically and practically significant if the p-value was lower than 0.01 and R higher than 1%).
Participants: All first-time takers of the American Board of Surgery General Surgery Certifying Exam from 2016 to 2022 that had demographic data, and the examiners that participated in those exams.
Results: The number of candidates/examiners for the 3 sets of models was 8665/514 (gender), 5906/465 (race), and 4678/295 (ethnicity). The demographic variables, format of the exam, or their interactions were not found to significantly relate to examiner-candidate ratings or pass/fail outcomes. The only variable that was significantly related to CE scores was candidates' QE scores, which was added to the models as a measure of candidates' initial knowledge; this held for all models for total scores (F[1,8659] = 1069.89, p-value < 0.01, R = 5% [gender models], F(1,5696.3) = 589.13, p-value < 0.01, R = 5% [race models], F(1,4459.5) = 278.33, p-value < 0.01, R = 5% [ethnicity models]), and pass/fail outcomes (CI = 1.61-1.73, p-value < 0.01, R = 3% [gender models], CI = 1.67-1.85, p-value < 0.01, R = 3% [race models], CI = 2.17-2.90, p-value < 0.01, R = 3% [ethnicity models]).
Conclusions: This study shows that there is not a relationship between candidate and examiner gender, race, or ethnicity, and exam outcomes based on statistical models looking at examiner-candidate ratings and pass/fail outcomes. In addition, the delivery of the certifying exam in a virtual format appears to have no statistical impact on outcomes compared to in-person delivery. This suggests that the ABS is performing well in both demographic bias and virtual space.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jsurg.2024.01.001 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!