Background: Recent studies, including those by the National Board of Medical Examiners, have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of LLM performance in specific medical content areas, thus limiting an assessment of their potential utility in medical education.
Objective: This study aimed to assess and compare the accuracy of successive ChatGPT versions (GPT-3.
Background: Recent studies, including those by the National Board of Medical Examiners (NBME), have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of these models' performance in specific medical content areas, thus limiting an assessment of their potential utility for medical education.
Objective: To assess and compare the accuracy of successive ChatGPT versions (GPT-3.