Objectives: ChatGPT is an artificial intelligence model that can interpret free-text prompts and return detailed, human-like responses across a wide domain of subjects. This study evaluated the extent of the threat posed by ChatGPT to the validity of short-answer assessment problems used to examine pre-clerkship medical students in our undergraduate medical education program.

Methods: Forty problems used in prior student assessments were retrieved and stratified by levels of Bloom's Taxonomy. Thirty of these problems were submitted to ChatGPT-3.5. For the remaining 10 problems, we retrieved past minimally passing student responses. Six tutors graded each of the 40 responses. Comparison of performance between student-generated and ChatGPT-generated answers aggregated as a whole and grouped by Bloom's levels of cognitive reasoning, was done using t-tests, ANOVA, Cronbach's alpha, and Cohen's d. Scores for ChatGPT-generated responses were also compared to historical class average performance.

Results: ChatGPT-generated responses received a mean score of 3.29 out of 5 (n = 30, 95% CI 2.93-3.65) compared to 2.38 for a group of students meeting minimum passing marks (n = 10, 95% CI 1.94-2.82), representing higher performance ( = .008, η = 0.169), but was outperformed by historical class average scores on the same 30 problems (mean 3.67,  = .018) when including all past responses regardless of student performance level. There was no statistically significant trend in performance across domains of Bloom's Taxonomy.

Conclusion: While ChatGPT was able to pass short answer assessment problems spanning the pre-clerkship curriculum, it outperformed only underperforming students. We remark that tutors in several cases were convinced that ChatGPT-produced responses were produced by students. Risks to assessment validity include uncertainty in identifying struggling students and inability to intervene in a timely manner. The performance of ChatGPT on problems requiring increasing demands of cognitive reasoning warrants further research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10540597PMC
http://dx.doi.org/10.1177/23821205231204178DOI Listing

Publication Analysis

Top Keywords

chatgpt validity
8
short answer
8
undergraduate medical
8
assessment problems
8
cognitive reasoning
8
chatgpt-generated responses
8
historical class
8
class average
8
responses
7
problems
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!