Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.
Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.
Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.
Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s10143-024-03144-y | DOI Listing |
Neurosurg Rev
December 2024
Division of Neurosurgery, Department of Neurosciences, College of Medicine and Philippine General Hospital, University of the Philippines Manila, Manila, Philippines.
Acad Med
May 2015
L.N. Peterson is adjunct professor, Department of Cellular Physiological Sciences, and senior evaluation advisor, Evaluation Studies Unit, Medical Education, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada. S.A. Rusticus is statistical analyst, Evaluation Studies Unit, Medical Education, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada. L.P. Ross is senior psychometrician, Scoring Services Unit of Professional Services, National Board of Medical Examiners, Philadelphia, Pennsylvania.
J Dent Educ
December 2003
Department of Testing Services, American Dental Association, Chicago, IL 60611-2678, USA.
Successful completion of Part II of the National Board Dental Examinations is a part of the licensure process for dentists. Good testing practice requires that the content of a high stakes examination like Part II be based on a strong relationship between the content and the judgments of practicing dentists on what is important to their practice of dentistry. In an effort to demonstrate this relationship for Part II, the Joint Commission conducted a practice analysis, which involved a two-dimensional model.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!